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William Erwin Paul (1936-201^ 



Legacy is defined as something handed 
down from the past, as from an ancestor 
or predecessor. The legacy of a great sci- 
entist includes not only their seminal 
work, but also the people they have men- 
tored and inspired. Immunologist William 
Erwin Paul, who died on September 18 
of this year, left just such a legacy. 

Bill Paul, as everyone knew him, was 
an extraordinary scientist, administrator, 
and mentor. He handed down a major 
body of scientific work, an internationally 
renowned department at the National 
Institutes of Health (the Laboratory of 
Immunology, which he headed for 45 
years), and many dozens of trainees who 
are still grateful for his mentorship and 
the opportunities that he gave them. As 
two of those mentees, we feel privileged 
to help share his story and the example 
he set for other scientists. Especially in 
an era that seems to reward shameless 
self-promotion and a disdain for teaching 
and administrative activities. Bill’s life and 
work across this whole spectrum stands 
out in marked contrast to the self-absorp- 
tion that some feel is necessary for a suc- 
cessful career in science. 

Born in Brooklyn in 1936, Bill’s parents 
were Jack Paul, an immigrant from the 
Ukraine, who ran an auto body 
shop, and Sylvia Gleicher, 
who came from a family 
of scientists. He first became 
interested in immunology by 
reading a collection of essays 
by Michael Heidelberger on a 
Brooklyn trolley, and following 
an internship and residency at 
what is now the Boston Medi- 
cal Center and the National 
Cancer Institute in Bethesda, 

MD, he found his way back to 
New York and into the Lab of 
Baruch Benacerraf at NYU. 

When Benacerraf accepted a 
position as head of the Labora- 
tory of Immunology (LI) at NIAID 
in Bethesda, Bill moved with 
him, and he remained at NIH 
the rest of his life. Within a 
year, Benacerraf was recruited 
to Harvard and Bill became 
chief of the laboratory at the 



At the NIH, there are different styles 
with which a lab chief can manage his or 
her charges, with some choosing to view 
everyone as an extension of their own 
research agenda. But here, as in many 
other ways. Bill set himself apart by allow- 
ing each group leader the freedom to 
pursue his or her own ideas while he 
shouldered the administrative burden for 
45 years. It’s hard to imagine anyone 
doing that these days. 

All the while, he ran a group of his own 
and had many singular scientific ac- 
complishments— first in cellular immu- 
nology, where he and his colleagues 
discovered IL-4, a key cytokine 
involved in T cell activity, particularly in 
allergic and inflammatory diseases. 
This became the major focus of his lab, 
which he completely converted to mo- 
lecular biology in order to clone and 
characterize the IL-4 gene and its tran- 
scriptional regulation. 

But another opportunity to serve in 
an important administrative role arose 
in 1993, when the NIH came under 
great pressure from AIDS activist groups 
to move more quickly to develop an 
effective treatment. In response, the 
decision was made to create an Office 




tender age of 34 in 1 970. 
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of AIDS Research at the NIH in 
order to coordinate HIV research. Here, 
Bill’s calming presence and deeply 
analytical skills were put to great effect, 
earning the trust of the activist com- 
munity and helping to organize the 
many scientific moving parts to be 
more effective. Out of this effort came 
the amazingly effective anti-HIV thera- 
pies that are used by more than 15 
million people today, transforming what 
was a death sentence into a chronic dis- 
ease. Bill was also very proud that, 
during this period, he was able to 
directly help persuade President Clinton 
to establish the Vaccine Research 
Center at NIH, which has been an impor- 
tant force in advancing both vaccine 
research and candidate vaccines into 
the clinic. 

Somehow, he found the time to partic- 
ipate in a myriad of national and interna- 
tional service activities, serving on prize 
committees, and review panels and as 
President of the American Association 
of Clinical Investigators and the Amer- 
ican Association of Immunologists. He 
also developed an advanced textbook 
(Fundamental Immunology), published in 
1984 and carried through seven editions. 
It has been rightly called the advanced 
text for any serious student of the sub- 
ject. Clearly, he had a passion for the 
subject and a passion to 
communicate that enthusiasm 
in almost every possible way. 

Cn a personal level, he was 
everyone’s dream mentor— 
immensely thoughtful and 
knowledgeable, kind, and 
optimistic, but also very 
rigorous. He gave those who 
worked with him the freedom 
and encouragement to do their 
best. In one of our cases 
(M.M.D.), we came to the LI 
largely ignorant of cellular 
immunology, which at that 
time was largely separate 
from the molecular version 
(so separate that Niels Jerne 
referred to “cis” and “trans” 
immunologists), but with clon- 
ing skills that were rare at 
that time and the idea to use 
said skills to investigate gene 
expression differences be- 
tween B and T lymphocytes. 
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Bill, while professing to have no expertise 
to offer this unusual endeavor, neverthe- 
less asked very astute questions and 
also (M.M.D. later learned) queried some 
noted molecular immunologists as to 
whether this was advisable. Apparently it 
was, because he gave it his go-ahead 
and, as the results started to look prom- 
ising, steered people looking for inter- 
esting projects toward it, leading to 
a small but effective branch of the lab 
that, with the addition of Steve Hedrick, 
endeavored to clone the genes behind 
the near legendary T cell receptor— key 
to understanding T cell specificity, the 
lack of which was a major stumbling block 
in the field at that time. All through this. 
Bill offered constant encouragement 
and those very good questions. But 
most remarkable of all, when that effort 
was actually successful despite intense 
competition — akin to winning a major lot- 
tery— Bill refused to take any credit, even 
though M.M.D. was all the while a post- 
doc under his tutelage. But this was who 
he was: a scientist’s scientist and an 
example to us all. 

L.H.G. came to Bill’s lab in 1979 as 
an MD postdoctoral fellow with only a 
year of research experience, needing all 
of the training and mentoring she could 
get. In Bill, she found someone who 
believed in providing lots of both. Since 
she was also 8 months pregnant. Bill 
must have wondered exactly how much 
work he was going to get out of her. 
When she returned to work a week after 
her daughter was born, perhaps he 
breathed a sigh of relief. 



In science, there are relatively safe 
projects and then there are somewhat 
risky projects. But Bill suggested that 
she take on an extremely risky project 
that no one, probably including Bill, 
thought could possibly work. But he was 
an incredible optimist— one of the things 
that we loved most about him. So L.H.G. 
plunged ahead and, to everyone’s aston- 
ishment, succeeded in generating mutant 
class II MHC antigen-presenting cells. 
The results turned out to be quite impor- 
tant in understanding the structure-func- 
tion relationship between theT cell recep- 
tor and its MHC ligands. 

Thinking back on this, practically every- 
thing Bill did turned out to be important. 
He had a great nose— or a green thumb, 
if you will— for sniffing out what would 
matter. For him, if a project didn’t reach 
for the moon, or at least the stars, then 
it wasn’t worth doing. This important 
lesson has inspired us throughout our 
careers to dare to take big risks where 
the payoff for science would be huge 
even though the likelihood of success 
might be small. 

Bill was also one of those rare individ- 
uals at the time who always supported 
women. He had many female postdocs, 
who called themselves Bill’s lymphettes. 
When L.H.G. insisted on returning to the 
lab the day after she delivered her second 
child just 2 years later. Bill didn’t blink an 
eye. Not many mentors would have borne 
with equanimity the arrival of a postdoc 
who was 8 months pregnant and who 
then proceeded to have a second child. 
Many young women nowadays might 



find it hard to believe how little under- 
standing professional women could 
receive in those days when they tried to 
combine career and family. In that regard, 
as in many others. Bill was a revolutionary. 
It may have helped that Marilyn Paul, 
Bill’s beloved and amazing wife, was 
herself a very strong and also supportive 
personality. 

In conclusion, it’s hard for something as 
impersonal as words to give a sense of a 
life as richly lived, as deeply beneficial to 
so many people, as the life of Bill Paul. 
Bill’s legacy was not only his great contri- 
butions to our understanding of the way 
the immune system works, but also his 
immeasurably important contributions to 
the scientific and personal development 
of the many people who had the good for- 
tune to work with him. The great English 
poet Robert Browning once said that the 
best measure of the height of a man is 
the length of the shadow his mind casts. 
By this measure. Bill Paul was a giant. 
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Unveiling the Code of Life 

Life’s Greatest Secret: The Race to Crack the Genetic Code 

Author: Matthew Cobb 

Basic Books: New York, NY, USA (2015). 464 pp. $29.99 



There is no shortage of written contribu- 
tions on various aspects of the history of 
genetics and molecular biology— The 
Eighth Day of Creation: The Makers of 
the Revoiution in Bioiogy stands out as a 
premier example, memorably focusing 
on reaching the intelligent lay reader. 
Matthew Cobb’s contribution is unques- 
tionably written for scientists. But it too 
deserves adulation as a masterwork. 

The introductory chapter Genes before 
DNA reminds readers of the familiar early 
pioneers of genetics, including Gregor 
Mendel, Hugo de Vries, Theodor Boveri, 
Wilhelm Johannsen, Thomas Hunt Mor- 
gan, and Nikolai Koltsov. This chapter 
also thoughtfully informs us of the impor- 
tant intellectual contributions of the 
eminent physicist Erwin Schrodinger, 
who is credited with the first notion of a 
“code script” when talking about how 
genes operate. The succeeding chapter 
called Information Is Everywhere may 
tempt all but the most intellectually ori- 
ented readers to toss the book aside. 
My advice is to curb this impulse should 
it arise! 

This reviewer was particularly entranced 
by Cobb’s treatment of the famous 
transformation experiments executed by 
Oswald Avery and his colleagues Colin 
MacLeod and Macleod McCarthy in the 
mid-1 940s that led them to the conclusion 
that genetic information resides in DNA 
rather than proteins, the latter being the 
alternative and widely held view in the ge- 
netics community. The erudition of this 
chapter lies in an element that particularly 
distinguishes Cobb’s writing, namely the 
historical depth that he has brought to 
this literary contribution. Most, if not all, 
students are taught that Avery was the 
first to experimentally demonstrate that 
genes are made of DNA. But few are likely 
aware of the enormous challenges that 
he had to endure from the unshakable 
adherence to the entrenched notion that 
genes are made of protein and that, 
if DNA was in anyway involved in gene 
action, it was surely by way of some sub- 



sidiary (perhaps structural) role. Besides, 
DNA was then considered an utterly 
boring molecule equipped with just the 
four bases, deoxyribose and phosphate, 
hardly persuasive “to bring about the 
almost infinitely different effects produced 
by genes.” Cobb informs that both the 
experiments of Avery and his group and 
the equally famous later experiments of 
Alfred Hershey and Martha Chase were 
persistently dogged by a criticism that 
was essentially impossible to definitively 
address, namely that they could never 
definitively prove that their transforming 
principle contained absolutely no protein. 
Cobb reveals that the influential biologist 
Alfred Mirsky, together with Arthur Polli- 
ster, published a widely read article that 
stressed that “there can be little doubt in 
the mind of anyone who has prepared nu- 
cleic acids that traces of protein probably 
remain in even the best preparations and 
that as much as 1 or 2 per cent of protein 
could be present in a preparation of pure, 
protein-free nucleic acid.” Even the cele- 
brated geneticist Herman Muller wrote 
in an article that he was personally 
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convinced that Mirsky’s suggestion that 
undetected “genetic proteins floating 
free in the medium caused Avery’s 
results.” 

Cobb points out that, regardless of “the 
overwhelming evidence, all of which sug- 
gested that the transforming principle was 
made of DNA and that genes may be too,” 
the final paragraph of the paper in which 
Avery and his colleagues announced their 
startling findings “opened with a phrase 
that suggested that the team was not 
quite as confident as they ought to have 
been.” “It is of course possible that 
the biological activity of the substance 
described here is not an inherent property 
of the nucleic acid but is due to minute 
amounts of some other substance ad- 
sorbed to it or so intimately associated 
with it as to escape detection,” Avery 
et al. wrote. But, they also boldly stated, 
“there is no evidence in favor of such a hy- 
pothesis that is chiefly supported by the 
traditional view that nucleic acids are 
devoid of biological specificity.” Distress- 
ingly, this traditional view hung around in 
the minds of many scientists, even prom- 
inent ones, for years, causing Avery to 
suffer frank clinical depression. And 
when Avery died in 1955, “the brief obitu- 
ary that appeared in the New York Times 
did not even mention DNA.” 

Regardless, over the years, Avery’s 
contention stimulated the thinking and 
work of an increasing cadre of established 
and future stars in genetics, including 
Joshua Lederberg. Cobb notes that the 
journal Nature described Avery’s work in 
glowing terms, and a (small) number of 
scientists were in fact highly complemen- 
tary. In October, 1944, the New York 
Academy of Medicine awarded Avery its 
Gold Medal. And in 1945, the Royal 
Society of London graced his experi- 
mental achievements with the Copley 
Medal. But Avery was never graced with 
the highly deserved distinction of Nobel 
Laureate. 

Cobb interrupts the progress of his his- 
tory with another epistemological chapter 
dubbed The Age of Control in which he 
outlines the discipline of cybernetics, a 
term that it is relevant to the study of 
systems, including mechanical, physical, 
biological, cognitive, and social systems. 
Cybernetics is applicable when a system 
being analyzed incorporates a closed 
signaling loop— i.e., where action by the 
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system generates some change in its 
environment and that change is reflected 
in the system in some manner (feedback) 
that triggers a system change. The intent 
of this chapter is to alert the reader to 
the emergence of cybernetics when feed- 
back mechanisms in molecular biology 
were discovered by later makers and 
shakers in molecular biology, notably 
from the exquisite experiments on gene 
regulation executed by the famous 
French duo of Francois Jacob and Jac- 
ques Monod described in a later chapter. 

Much of the rest of the book covers the 
history on the elucidation of the structure 
of DNA (a topic well covered in James D. 
Watson’s The Double Helix) and the pur- 
suit of the Holy Grail— deciphering the ge- 
netic code. Cobb peppers his writing 
of the latter seminal breakthrough with 
delightfully interesting anecdotal informa- 
tion that displays the depth of research for 
his book. He informs the reader: 

“On March 19, 1953, about two weeks 
after the double helix model had been 
completed, Francis Crick wrote a letter 
to his 12-year old son, Michael, who was 
at boarding school. Crick told Michael 
what he had discovered, and included a 
sketch of the structure of DNA. He then 
went on to explain the significance of the 
double helix. ‘It’s like a code,’ Crick wrote 
to Michael. ‘If you are given one set of let- 
ters you can write down the others. Now 
we believe that the D.N.A. is a code. 
That is, the order of the bases (the letters) 
makes one gene different from another 
gene (just as one page of print is different 
from another).”’ 

Cobb relates that, while the notion that 
the sequence of bases in a DNA chain 
had been speculated for some time, this 
letter to his very young son was the first 
time that anyone had stated in writing 
that DNA contains a code. In 2013, the 



letter fetched $6 million at an auction! 
Crick’s leadership, intellectual genius, 
and scintillating personality during the 
period in which the genetic code was 
slowly but surely unraveled leap majesti- 
cally from Cobb’s pen. 

Equally arresting and presumably little- 
known historical anecdotes surface when 
Cobb relates that, after Marshall Niren- 
berg (an unknown scientist to most of 
the molecular biology community) re- 
ported his use of homopolymers to 
elucidate the genetic code in a 10 min 
talk at the Fifth International Congress 
of Biochemistry in Moscow in August 
1961, Matt Meselson informed Crick of 
these electrifying experimental results, 
prompting Crick to invite him to present 
his findings again in a longer plenary 
talk at a symposium that Crick was to 
chair the following day. Following his sec- 
ond presentation, Nirenberg was so grat- 
ified and elated he was prompted to 
comment: 

“The reception was really remarkable, 
fantastic. I remember Matt Meselson, 
who was sitting right up front. I didn’t 
know him at the time, but he was so over- 
joyed about hearing this stuff that he 
impulsively jumped up, grabbed my 
hand, and actually hugged me and 
congratulated me for doing that. I could 
have been part of a rock band or some- 
thing. That really meant an awful lot to 
me. It really meant more to me than all 
kinds of awards and what-not because it 
was genuine and spontaneous.” 

The work in the Nirenberg laboratory 
and that of a competing laboratory led 
by the Spanish-born biochemist Severe 
Ochoa contributed mightily to decipher- 
ing the genetic code. In 1968, Nirenberg 
shared the Nobel prize in physiology or 
medicine with Robert Holley and Gobind 
Khorana. 



Cobb concludes his book with a section 
entitled Update that details the history of 
molecular genetics and molecular biology 
to the present time, including the discov- 
ery of introns, the use of the polymerase 
chain reaction (PCR), sequencing entire 
genomes, paleogenomics, population ge- 
netics, evolutionary genetics, genetic en- 
gineering, the potential for synthetic 
biology, and more. The book also features 
a pleasing gallery of photos. 

The dominance of individual brilliance 
in molecular genetics so remarkably dis- 
played by Francis Crick and Sydney Bren- 
ner during the decades of 1950s and 
1960s is fading all too rapidly. In his 
conclusion, Cobb addresses the perils of 
“big science,” especially in the field of ge- 
nomics, pointing to a paper in Nature Ge- 
netics published in 2014 that listed 440 
authors! Cobb notes “It is now becoming 
commonplace, changing the relationship 
of individual scientists to the work they 
produce, rendering each person’s contri- 
bution relatively minor and highly spe- 
cific.” This threatening shift in the sociol- 
ogy of science cries out for attention if 
molecular biology is to regain its former 
attraction to college students interested 
in pursuing careers in disciplines exempli- 
fied by modern day genomics. 

All in all, Matthew Cobb, who hails from 
the University of Manchester— which 
notably includes a Center for the History 
of Science, Technology and Medicine 
and whose eclectic historical contribu- 
tions include the efforts of the French 
resistance during WWII — has presented 
the scientific and perhaps members of 
the non-scientific communities an erudite 
and comprehensive history that should be 
required reading for all graduate students 
in the disciplines of genetics and molecu- 
lar biology and, most certainly, students 
of the history of science. 

Errol C. FriedbergT* 

■■Department of Pathology, University of 
Texas Southwestern Medical Center, Dallas, 
TX 75390-9072, USA 
*Correspondence: errol.friedberg@ 
utsouth western .ed u 

http://dx.d 0 i. 0 rg/l 0.101 6/j.cell.201 5.10.01 8 



532 Cell 163, October 22, 2015 ©2015 Elsevier Inc. 




Leading Edge 

Bench to Bedside 



Treatment for Hypoactive Sexual Desire 

James G. Pfaus 

Center for Studies in Behavioral Neurobiology, Department of Psychology, Concordia University, Montreal, QC H4B 1 R6, Canada 

Correspondence: jim.pfaus@concordia.ca 

http://dx.doi.Org/10.1016/j.cell.2015.10.015 



Flibanserin o 

uA. 



I 



'^N 



Prefrontal 

cortex 



Of 



5-HTiAAgonist 
5- HT 2 A Antagonist 



Brainstem 
— f Glutamate - 



Disinhibited Disinhibited Inhibited 
DA Neurons NE Neurons 5-HT Neurons 



t5-HT 
iDA ^ 
4ne 



Increased sexual desire 

Decreased sexual distress 

Modest increase in sexually satisfying events 



Flibanserin acts at cortical, limbic, hypothalamic, and 
brainstem nuclei to inhibit serotonin release by binding to 
5-HT1A autoreceptors and block postsynaptic action of 
serotonin at 5-HT2A receptors. This gradually disinhibits 
the turnover of other monoamines like dopamine and 
noradrenaline that are critical for sexual desire. 
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CELLULAR TARGETS 
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1995 

BIMT 1 7 (flibanserin) is a 5-HT2A receptor 
antagonist and a 5-HT1 A receptor full agonist 



1997 

Flibanserin proposed as a potentially 
fast-acting antidepressant 



1999 

Estimates suggest 43% of women have 
experienced some form of sexual 
desire disorder 



1995 



2000 



2005 



2009 

Boehringer Ingelheim files FDA application 
to treat HSDD with flibanserin 

2010 

Citing weak effects on daily diary 
desire reports, FDA committee 
rejects flibanserin 

2011 

FDA panel accepts that 
HSDD is an unmet 
medical need 



2010 



2015 

After examination of 
new data, flibanserin 
approved by the FDA 



2015 



References for further reading are available with this article online: www.cell.com/cell/abstract/S0092-8674(15)01329-X 
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LINE retrotransposons actively shape mammalian genomes. Denli et al. reveal a new open reading 
frame, ORFO, on the antisense strand of human LINE-1 encoding a small regulatory protein. This 
finding may represent the birth of an emerging retrotransposon gene that can adopt various fates, 
as it can be fused to adjacent host sequences. 



Long interspersed repeats or LINEs 
are retrotransposons that have littered 
mammalian genomes since their diver- 
gence from other vertebrates hundreds 
of millions of years ago. The human version 
of this sequence, LINE-1 , is active in germ- 
lines, early embryos, and the brain, as well 
as in selected human cancers (Goodier, 
2014). LINES are known as potent 
agents of genome instability by mobilizing 
themselves, other sequences that do not 
encode reverse transcription machinery, 
such as short interspersed repeats 
(SINEs), and a multitude of processed 
pseudogenes (Burns and Boeke, 2012). 
Two LINE-1 genes, ORF1 and ORF2, are 
encoded by the human LINE-1 sequence, 
and both are directly involved in retrotrans- 
position (Moran et al., 1996; Figure 1). 
ORF1 encodes a nucleic acid binding 
protein that avidly binds single-stranded 
RNA in the ribonucleoprotein particle that 
serves as a retrotransposition 
intermediate, whereas ORF2 
specifies a polyprotein with 
both endonuclease and 
reverse transcriptase activ- 
ities. The ORF1 and ORF2 se- 
quences were defined nearly 
30 years ago, when the 
consensus sequence of the 
active subfamily of LINEs was 
deduced (Scott et al., 1987). 

It therefore comes as a big 
surprise that this intensively 
studied element in fact sports 
a third open reading frame, 
dubbed “ORFO,” within the 5' 

UTR of the LINE-1 transcript 
and on the opposite strand 
as the ORF1 and ORF2 struc- 
tural genes (Denli et al., 2015 
[this issue of Cell]). How could 



something so obvious have been missed 
for so long? There are perhaps three major 
reasons. Unlike the two ORFs that we 
know so well, it is encoded on the anti- 
sense strand. Moreover, ORFO is very 
short, encoding a 71 amino acid peptide, 
which is in marked contrast to ORFs 1 
and 2 that collectively span nearly 
5,000 bp. Finally, unlike ORFs 1 and 2, 
the ORFO sequence is conserved only 
within the primate lineage, a strong argu- 
ment that the sequence does not play a 
direct constitutive role for retrotransposi- 
tion. 

The super-short nature of ORFO and 
overall lack of conservation rightly calls 
into question whether or not this is really 
a gene at all or just an accidental juxtapo- 
sition of codons. It is presumably a rela- 
tively newborn gene of the primate line- 
age, albeit one inhabiting the genome of 
a DNA parasite rather than that of the 



primates themselves. Denli et al. (2015) 
brought multiple lines of evidence forward 
to support that ORFO is in fact functional. 
The LINE-1 sequence contains two pro- 
moters, the best known of which initiates 
at the first base pair of the element. It is 
the promoter responsible for expression 
of ORFsl and 2 and serves as the template 
for retrotransposition. A second antisense 
promoter drives expression out of the left 
end of the element, and it has been adop- 
ted as a promoter by multiple human 
genes (Matlick et al., 2006; Figure 1). 
ORFO is well positioned to have its expres- 
sion driven by this antisense promoter. 
Moreover, insertion of reporter genes and 
tags in frame with ORFO in an otherwise 
intact and unremarkable LINE-1 element 
led to gene expression in embryonic stem 
cells, and mutation of the ORFO AUG initi- 
ator codon eliminated such expression. 
In addition, the GFP-ORFO fusion protein 
was localized to the nucleus. 
Interestingly, ORFO protein 
encompasses one or two 
splice donor sites previously 
observed to be fused to splice 
acceptors inside or, more 
commonly, outside various 
copies of the LINE-1 element. 
Capped and ribosome occu- 
pied ORFO transcripts were 
readily identified and were far 
more abundant in stem cells 
than in fibroblasts, as is also 
the case for full-length LINE-1 
ORF1-2 transcripts. Further 
ribosome footprinting and 
RNA-seq analyses identified 
fusion transcripts between 
ORFO and at least five human 
genes. In addition, phyloge- 
netic analyses showed that 




Figure 1. Relationship between ORFO and Other Key Elements in 
Human LINE-1 Retrotransposon 

The schematic (roughly to scale) shows the 5' UTR region containing two 
promoters, ORFO, its downstream exon and signals for multiple splice-iso- 
forms. Notably, the size of ORFO is remarkably small, and it can be joined to a 
downstream exon to produce fusion protein product. 
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ORFO could be reliably detected in ~50 
copies in old world monkeys and thou- 
sands of copies in humans and great 
apes, but not in new world monkeys. 

A critical question is whether ORFO pro- 
tein can be detected in non-engineered 
primate cells. Denli et al. (2015) provided 
evidence for the existence of the ORFO 
protein using a combination of immuno- 
precipitation and mass spectrometry 
(MS). They overcame the issue of the 
mismatch between low ORFO protein 
concentration and the limited dynamic 
range and sensitivity of MS by using poly- 
clonal antibodies to enrich ORFO protein. 
A second issue often encountered in MS 
analysis of short proteins is that, after 
digestion, there are often very few if any 
peptides amenable to MS sequencing, 
which need to be of a just-right length 
and well fragmented so that their se- 
quences can be determined with high 
confidence. Denli et al. (2015) were able 
to obtain extensive fragmentation infor- 
mation almost entirely covering three 
tryptic peptides corresponding to ORFO 



and its second exon (Figure 1). The MS 
detection was carried out on both overex- 
pressed ORFO protein and endogenous 
protein produced in human cells. 

Just because a sequence is expressed 
does not make it a gene that encodes 
a functional protein. In this study, Denli 
et al. (2015) produced evidence suggest- 
ing a regulatory role for ORFO-encoded 
protein. Previous work had shown that 
an element driven by a promoter 
completely lacking LINE-1 sequences 
was active in retrotransposition, arguing 
strongly against a required role in c/s. How- 
ever, such a function might be provided 
in trans. Indeed, Denli et al. (2015) used 
a CAG-LINE-1 retrotransposition reporter 
element similar to those described earlier 
(Moran et al., 1996) to evaluate hopping 
frequency and showed that overexpres- 
sion of ORFO from a separate plasmid 
enhanced retrotransposition frequency 
by 41%. Thus, it seems likely that ORFO 
plays some positive regulatory role in the 
retrotransposition process. It remains to 
be determined whether such a role of 



ORFO is in any way related to its capacity 
in generating fusion protein containing 
host genomic sequences. Moreover, it 
would be interesting to see whether, and 
if so how, the ORFO protein might function- 
ally contribute to LINE-1 retrotransposition 
mechanistically. 
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Using mutation libraries and deep sequencing, Aakre et al. study the evolution of protein-protein 
interactions using a toxin-antitoxin model. The results indicate probable trajectories via “interme- 
diate” proteins that are promiscuous, thus avoiding transitions via non-interactions. These results 
extend observations about other biological interactions and enzyme evolution, suggesting broadly 
general principles. 



HEAD HEAL TEAL TELL TALL TAIL. This 
word game devised by Lewis Carroll re- 
quires moving from one word to another 
while keeping all intermediate words 
meaningful. It offers a nice analogy for a 
protein evolution model, where words 
represent functional proteins and muta- 



tions are word-to-word moves (Smith, 
1970). It also represents one side of a 
debate, whether mutational navigation in 
sequence space from one protein func- 
tion to another traverses via evolutionary 
intermediates that retain some functional 
features along the pathway to a new func- 



tion. Because the evolution of new speci- 
ficities in protein-protein interactions 
requires changes in at least two partners, 
the challenges for retaining functions that 
are vital for cell survival while evolving 
new ones may be more constrained 
(and more complicated) than in other 
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represent functional proteins and muta- 
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1970). It also represents one side of a 
debate, whether mutational navigation in 
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tion to another traverses via evolutionary 
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features along the pathway to a new func- 
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Figure 1. Reprogramming Specificity via Promiscuous In- 
termediates 

(A) A Venn diagram showing the overiap between three sampie sets of PraD3 
antitoxin variants. Sequence iogos represent the diversity in four specificity- 
determining positions that are overrepresented in ParD3 antitoxin variants that 
fit either ParE3 (the native toxin, upper), ParE2 (another toxin, iower) or both 
(middie). Amino-acid coiors differ for each position; the darker the coior, the 
more prevaient it is in the promiscuous motif. Grey residues represent cases in 
which the E3-specific iogo or E2-specific iogo inciudes residues that do not 
appear in the corresponding position in the promiscuous iogo. 

(B) The modei suggested by Aakre et ai. for reprograming protein-protein inter- 
action specificity. An enabiing, promiscuity-exerting mutation of protein X (Xto 
X’) aiiows protein Y to change its specificity determinant (Y to Y’) and stiii bind 
both Xs. Protein X’ then mutates (X’ to X”) with an increase in specificity toward 
Y’. The protein-protein interactions are thus maintained throughout the evoiu- 
tionary process. Features of this figure were adapted from Aakre et al. (201 5). 



systems. How is cross-reac- 
tion between the evolving, ho- 
mologous interaction partners 
evaded? What mutational tra- 
jectories do partners traverse 
while avoiding intermediate 
steps that may have negative 
biological consequences? In 
this issue of Cell, Aakre et al. 

(2015) utilize a model of 
toxin-antitoxin (TA) protein in- 
teractions that are essential 
for bacterial survival to study 
the problem systematically. 

Their results provide evidence 
for the preference for evolu- 
tionary paths involving biolog- 
ically functional promiscuous 
intermediate steps, rather 
than switch-like trajectories 
that include non-interacting 
intermediates. 

Bacteria typically include 
several chromosomally en- 
coded paralogs of TA pairs in 
which an antitoxin neutralizes 
the toxin by interacting with 
it. Aakre et al. focus on the 
discrete problem of emer- 
gence of a new TA pair from 
an existing one. The specific 
ParD3/ParE3 interaction pair 
they chose for these experiments exhibits 
systemwide mutual exclusiveness, with 
almost no cross-reaction with other TA 
pairs that could complicate retrieval and 
interpretation of results. 

Aided by a high-resolution crystal 
structure they solved for a specific 
ParD3/ParE3 complex, a 4-residue motif 
sufficient for reprogramming interaction 
specificity was identified. A mutation li- 
brary was then constructed in which the 
one invariant Trp was retained while the 
three co-evolving positions of the motif 
were varied using only residues often 
found in natural ParD homologs. Fitness 
was approximated for intermediate 
stages in the path between one specific 
TA pair and another of different specificity 
using a competitive growth assay that 
allowed recovery of successful variants 
enriched over the time course of the ex- 
periments. For the ParD3 library, 252 var- 
iants were recovered that could effec- 
tively antagonize ParE3. As expected, 
repeating the competitive assay with 
another toxin, ParE2, produced a different 



set of 1 51 variants that neutralize this sec- 
ond toxin. 

The two antitoxin specificities are typi- 
fied by distinct motifs in ParD3 specificity 
determinants (Figure 1A). While position 
59 appears largely diffident in its varia- 
tion pattern, position 61 is enriched for 
negatively charged residues for ParE3- 
specific variants, in contrast to small hy- 
drophobic/positively charged residues in 
the ParE2-specific variants. Similarly, po- 
sition 64 is enriched for positively 
charged residues in ParE3-specific vari- 
ants, compared to small hydrophobic 
residues in ParE2 specific variants. 
Importantly, 31 variants exhibit dual- 
specificity toward ParE2/3, characterized 
by ParE3-like specificity at position 61 
and ParE2-like specificity at position 64. 
Strikingly, evaluation of all alternative 
mutational trajectories between the two 
distinct specificities sampled shows sta- 
tistically significant overrepresentation of 
traversal via promiscuous intermediates. 
Mutational trajectories also show sig- 
nificant enrichment for epistasis, rather 



than additive effect of muta- 
tions, consistent with similar 
findings in the evolution 
of enzyme-substrate interac- 
tions (Weinreich et al., 2006). 

To investigate the important 
question involving co-evolu- 
tion in interacting proteins, 
the authors performed 
another experiment traversing 
the sequence space from 
ParD3/ParE3 to ParD3V 
ParE3*. Again, they found a 
prevalence in intermediate 
promiscuous variants, and, 
most importantly, that all pre- 
sumed trajectories traversed 
via at least one promiscuous 
intermediate, suggesting the 
plausibility of this evolutionary 
path. Figure IB summarizes 
these results, in which an X-Y 
interaction evolves to the or- 
thologous X”-Y’ interaction in 
at least three steps: (1) Muta- 
tion(s) in X to X’ broadens 
specificity, allowing (2) Y to 
form a mutant, Y’, that has 
the potential to interact with 
X as well as X’, and finally (3) 
X’ is mutated to X,” narrowing 
its specificity to Y’. 

Although reconstituting a natural history 
of protein repurposing is challenging and 
cannot be explicitly determined, this work 
contributes important initial observations 
toward this goal. Typically, mutation li- 
braries sample a fraction of the sequence 
space. The approach used in this work al- 
lowed exclusion of infrequent, albeit viable 
trajectories, by focusing on the four most 
relevant positions and targeting only resi- 
dues commonly appearing in contempo- 
rary proteins. Epistatic constraints and 
the occurrence of intermediates of modi- 
fied or reduced function have been 
demonstrated for other types of models 
including in enzyme evolution, (Aharoni 
et al., 2005) and receptor-ligand evolution 
(Ortiund et al., 2007), and its practical im- 
plications have been exploited for pro- 
tein-protein interaction engineering (Kor- 
temme et al., 2004). Placed in this 
broader context, Aakre et al. provide new 
evidence for extending these conjectures 
to protein-protein interactions. 

This work also suggests avenues 
for future research. For example, the 
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contributions of neutral drift and the 
impact of a few large-effect mutations 
versus many small-effect ones will need 
to be evaluated. Work on protein-protein 
interactions should be extended to other 
systems where cross-reaction is an issue, 
such as in other TA modules documented 
as cross-reacting (Zhu et al., 2010). 
Cross-reaction is also pertinent in other 
types of natural systems, as has been 
shown in some SH3 systems (Zarrinpar 
et al., 2003) and in the evolution of meta- 
bolic pathways (Kim and Copley, 2012). 
Ultimately, there are many other variables 
likely to be relevant in natural evolution 



that will surely be more difficult to ascer- 
tain in experimental systems. As with 
this work by Aakre et al., development of 
other new approaches may be essential 
for dissecting additional features in the 
evolution of protein-protein interactions. 
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Bacterial type VI secretion is an offensive and defensive weapon that utilizes a molecular warhead 
to inject toxins into neighboring cells. In this issue of Cell, Whitney et al. report a new class of toxin 
that disrupts the core metabolism of recipient cells and uncover a surprising requirement for EF-Tu. 



Nowhere is life more fiercely competitive 
than in the invisible world of bacteria and 
Other microbes. Vast in numbers, these 
diminutive creatures disable their com- 
petitors by assailing each other with a 
range of weapons that include dispers- 
ible small molecules (antibiotics) and 
protein toxins (e.g., colicins). Perhaps 
the most cunning weapon is the type VI 
secretion (T6S) system of Vibrio, Pseudo- 
monas, and certain other Gram-negative 
bacteria, which forms a miniscule spring- 
loaded dagger— a phage-tail-like con- 
tractile apparatus, complete with a 
molecularly poisoned sharp tip— to 
instantaneously inject protein toxins at 
point-blank range into neighboring cells. 
Evoking the infamous Umbrella Murder, 
in which Bulgarian dissident Georgi Mar- 
kov was assassinated by a ricin-laced 
projectile fired from an umbrella, the 
spear-gun-like T6S system fires into eu- 
karyotic cells and bacteria alike, breach- 
ing their membranes and delivering toxic 



effector molecules with different modes 
of action. In this issue of Ceii, Whitney 
et al. (2015) report that a recently discov- 
ered effector poisons cells differently 
from previously known effectors and, 
surprisingly, requires the translation elon- 
gation factor EF-Tu to intoxicate target 
cells. 

The first functionally characterized T6S 
system, from V. choierae, was revealed 
by its role in warding off predation by 
amoebae (Pukatzki et al., 2006), but T6S 
systems are increasingly viewed as part 
of an arsenal that bacteria use against 
one another. Indeed, bacteria can even 
be seen duking it out, repeatedly attack- 
ing and counterattacking in a process 
termed “dueling” (Basler et al., 2013). 
Characterized T6S effector molecules 
include lipases that target the bacterial 
membrane, peptidoglycan hydrolases 
that degrade the cell wall, and nucleases 
that act on the nucleoid (Figure 1A) (Du- 
rand et al., 2014). Structural and mecha- 



nistic studies of a recently discovered 
effector, called Tse6, by Whitney et al. 
(2015) reveal yet a different mechanism. 
Tse6 resembles diphtheria toxin and 
Other toxins that transfer ADP-ribose 
from NAD^ onto proteins to inactivate 
them, but Tse6 is a pure glycohydrolase 
that intoxicates cells by depleting them 
of cytoplasmic nicotinamide adenine 
dinucleotide (phosphate) (NAD(P)^). 
Attacker cells expressing Tse6 are pro- 
tected by its cognate immunity protein, 
Tsi6, which tightly plugs the Tse6 active 
site. 

An enduring question is where in the 
target cell the warhead of effector pro- 
teins is initially delivered. Is it to the peri- 
plasm only, to the cytoplasm, or to both? 
In principle, the phage-tail-like tube of 
the T6S apparatus is long enough to 
penetrate 500 nm into a target cell 
(Basler et al., 2012; Ho et al., 2014), 
which could allow for direct delivery 
into the cytoplasm. But lipases and 
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contributions of neutral drift and the 
impact of a few large-effect mutations 
versus many small-effect ones will need 
to be evaluated. Work on protein-protein 
interactions should be extended to other 
systems where cross-reaction is an issue, 
such as in other TA modules documented 
as cross-reacting (Zhu et al., 2010). 
Cross-reaction is also pertinent in other 
types of natural systems, as has been 
shown in some SH3 systems (Zarrinpar 
et al., 2003) and in the evolution of meta- 
bolic pathways (Kim and Copley, 2012). 
Ultimately, there are many other variables 
likely to be relevant in natural evolution 



that will surely be more difficult to ascer- 
tain in experimental systems. As with 
this work by Aakre et al., development of 
other new approaches may be essential 
for dissecting additional features in the 
evolution of protein-protein interactions. 
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Bacterial type VI secretion is an offensive and defensive weapon that utilizes a molecular warhead 
to inject toxins into neighboring cells. In this issue of Cell, Whitney et al. report a new class of toxin 
that disrupts the core metabolism of recipient cells and uncover a surprising requirement for EF-Tu. 



Nowhere is life more fiercely competitive 
than in the invisible world of bacteria and 
Other microbes. Vast in numbers, these 
diminutive creatures disable their com- 
petitors by assailing each other with a 
range of weapons that include dispers- 
ible small molecules (antibiotics) and 
protein toxins (e.g., colicins). Perhaps 
the most cunning weapon is the type VI 
secretion (T6S) system of Vibrio, Pseudo- 
monas, and certain other Gram-negative 
bacteria, which forms a miniscule spring- 
loaded dagger— a phage-tail-like con- 
tractile apparatus, complete with a 
molecularly poisoned sharp tip— to 
instantaneously inject protein toxins at 
point-blank range into neighboring cells. 
Evoking the infamous Umbrella Murder, 
in which Bulgarian dissident Georgi Mar- 
kov was assassinated by a ricin-laced 
projectile fired from an umbrella, the 
spear-gun-like T6S system fires into eu- 
karyotic cells and bacteria alike, breach- 
ing their membranes and delivering toxic 



effector molecules with different modes 
of action. In this issue of Ceii, Whitney 
et al. (2015) report that a recently discov- 
ered effector poisons cells differently 
from previously known effectors and, 
surprisingly, requires the translation elon- 
gation factor EF-Tu to intoxicate target 
cells. 

The first functionally characterized T6S 
system, from V. choierae, was revealed 
by its role in warding off predation by 
amoebae (Pukatzki et al., 2006), but T6S 
systems are increasingly viewed as part 
of an arsenal that bacteria use against 
one another. Indeed, bacteria can even 
be seen duking it out, repeatedly attack- 
ing and counterattacking in a process 
termed “dueling” (Basler et al., 2013). 
Characterized T6S effector molecules 
include lipases that target the bacterial 
membrane, peptidoglycan hydrolases 
that degrade the cell wall, and nucleases 
that act on the nucleoid (Figure 1A) (Du- 
rand et al., 2014). Structural and mecha- 



nistic studies of a recently discovered 
effector, called Tse6, by Whitney et al. 
(2015) reveal yet a different mechanism. 
Tse6 resembles diphtheria toxin and 
Other toxins that transfer ADP-ribose 
from NAD^ onto proteins to inactivate 
them, but Tse6 is a pure glycohydrolase 
that intoxicates cells by depleting them 
of cytoplasmic nicotinamide adenine 
dinucleotide (phosphate) (NAD(P)^). 
Attacker cells expressing Tse6 are pro- 
tected by its cognate immunity protein, 
Tsi6, which tightly plugs the Tse6 active 
site. 

An enduring question is where in the 
target cell the warhead of effector pro- 
teins is initially delivered. Is it to the peri- 
plasm only, to the cytoplasm, or to both? 
In principle, the phage-tail-like tube of 
the T6S apparatus is long enough to 
penetrate 500 nm into a target cell 
(Basler et al., 2012; Ho et al., 2014), 
which could allow for direct delivery 
into the cytoplasm. But lipases and 
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Figure 1 . T6S Effectors and Their Sites of Action 

(A) The T6S apparatus delivers effectors to the periplasmic space of Gram-negative target bacteria. It is 
unknown whether it can also breach the peptidoglycan and deliver effectors directly to the cytoplasm. 
Some effectors, such as lipases and peptidoglycan hydrolases (amidases or glycosylases), act within the 
periplasmic space. Others, such as nucleases and the newly discovered NAD(P)^ hydrolase Tse6, act in 
the cytoplasm. 

(B) Model for delivery and translocation of the Tse6 effector. When loaded on the T6S apparatus, the 
hydrophobic transmembrane regions of Tse6 are protected by its chaperone, Eag6 (left). Either in transit or 
once delivered, Eag6 dissociates from Tse6, exposing its three transmembrane regions. Tse6 then par- 
titions into the cytoplasmic membrane, where its C-terminal effector domain is drawn into the cytoplasm 
by association with cytoplasmic EF-Tu (center). After being pulled into the cytoplasm, the effector in- 
toxicates the target cell via its NAD(P)^ hydrolase activity (right). 



peptidoglycan hydrolases only need to 
reach the periplasmic space, and the 
cell wall might present a barrier to me- 
chanical puncture. On the other hand, 
nuclease effectors and the Tse6 
NAD(P)^ hydrolase require entry into the 
cytoplasm to reach their targets. Until 
an attacking cell is visualized during the 
act of firing into its target, we will be 
forced to make inferences about how 
the toxins reach their targets. However, 
a surprising requirement of Tse6 sheds 
light on the mechanism of effector deliv- 
ery for at least one toxin. 

While investigating the interaction of 
Tse6 with other cellular proteins, the au- 



thors discovered that the NAD(P)'^ hydro- 
lase forms a complex with the translation 
elongation factor EF-Tu. The interaction 
is strong and specific, and substitution 
of a single amino acid, identified in a 
Tse6-EF-Tu co-crystal structure, abol- 
ished Tse6 binding to EF-Tu. Remarkably, 
when this mutant Tse6 was introduced 
into attacker cells, they could no longer 
disable target cells, suggesting a specific 
requirement for EF-Tu in the delivery or 
activity of the toxin. 

How are we to understand the unex- 
pected requirement for EF-Tu in Tse6 
toxicity? By using the non-interacting 
mutant, the authors ruled out several pos- 



sibilities. Interaction with EF-Tu is 
required neither for the stability of the 
Tse6 protein nor for its NAD(P)^ hydrolase 
activity. They also showed that it is not 
required for export of Tse6 from attacker 
cells. Thus, by a process of elimination, 
the authors conclude that the binding of 
Tse6 to EF-Tu facilitates the entry of 
Tse6 into the cytoplasm of target cells. 
In their model, Tse6 is delivered by the 
T6S system into the periplasm of a target 
cell along with its chaperone (Eag6), 
which shields its hydrophobic transmem- 
brane regions. Tse6 then inserts into the 
cytoplasmic membrane of the target cell 
and is granted entry to the cytoplasm by 
progressively interacting with EF-Tu as it 
crosses the membrane (Figure 1B). 
Thus, the T6S system does not deliver 
the toxin into the cytoplasm. Rather, 
Tse6 only needs to reach the cytoplasmic 
membrane of the target cell, where it then 
becomes trapped in the cytoplasm by 
EF-Tu. If this model is correct, a comple- 
mentary mutant of EF-Tu that is specif- 
ically blocked in binding to Tse6 would 
be expected to confer immunity to the 
toxin in target cells. 

The Tse6 results do not exclude the 
possibility that other T6S effectors with 
cytoplasmic targets are directly delivered 
to the target cytoplasm, as Tse6 may 
have evolved a special, EF-Tu-depen- 
dent mode of cytoplasmic entry that is 
not shared by other cytoplasmic effec- 
tors. Perhaps other effectors, such 
as the T6S-delivered nuclease of 
P. aeruginosa (Hachani et al., 2014), are 
injected directly into the cytoplasm 
without the aid of target cell proteins. 
Finally, we note that T6S systems are 
not restricted to the delivery of toxins, 
as in the fascinating case of the transfer 
of proteins that mediate self/non-self 
identity between cells of Proteus mirabilis 
(Wenren et al., 2013). 
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The current obesity epidemic has focused a great deal of attention on mechanisms controlling 
energy balance. While diet and nutrient absorption affect energy intake, on the other side of the 
equation, energy expenditure is determined by basal metabolism, physical activity, and adaptive 
thermogenesis. Given various challenges in modulating these energy balance mechanisms to com- 
bat human obesity, many efforts have concentrated on how it might be possible to achieve weight 
loss through increased thermogenesis. In this issue of Cell, Kazak et al. describe a previously un- 
recognized molecular pathway for thermogenesis in fat cells. 



Non-shivering thermogenesis occurs pri- 
marily in brown adipose tissue in rodents, 
but also has been detected in so called 
“beige” adipocytes, thought to reside 
mainly in subcutaneous fat tissue inter- 
spersed with classic white adipocytes 
(Wu et al., 2012; Young et al., 1984). 
Much of human thermogenic fat most 
closely resembles the rodent beige adi- 
pose tissue (Shinoda et al., 2015). Beige 
and brown adipocytes both express un- 
coupling protein 1 (Ucpi), which resides 
on the inner mitochondrial membrane. 
While the electron transport chain drives 
protons into the intermembrane space in 
mitochondria, creating a proton gradient 
across the inner membrane to drive the 
synthesis of ATP (Figure 1A), Ucpi cre- 
ates a pore through which protons 
disperse into the mitochondrial matrix, 
thereby generating heat and uncoupling 
ATP synthesis (Figure 1 B). Cold exposure 
or increased sympathetic activity stimu- 
lated by feeding activates thermogenesis 
through adrenergic activation of Ucp1 
expression (Ricquier et al., 1986; Scar- 
pace et al., 1997). 

Although Ucpi is well established as an 
important component of thermogenesis, 
investigators have long known that the 



transcriptional regulation of Ucp1 cannot 
fully explain thermic responses. For 
example, the thermic effect of feeding is 
far too rapid to be explained by a tran- 
scriptional effect alone (Scarpace et al., 
1997). Furthermore, Ucp1 knockout 
mice can adapt to chronic cold exposure 
when the temperature transition is 
gradual (Golozoubova et al., 2001). Non- 
shivering thermogenesis has also been 
characterized in muscle, where Sarcolipin 
(Sin) uncouples ATP hydrolysis from Ca^"^ 
transport, thereby creating a futile cycle 
that generates heat (Bal et al., 2012). 
Flowever, Ucp1/Sln double-knockout 
mice still retain the ability to maintain ther- 
mal regulation when slowly adapted to the 
cold (Rowland et al., 2015), leaving a gap 
in our understanding of how thermogene- 
sis occurs. 

To fill in this gap, Bruce Spiegelman and 
his colleagues, including mass spec 
expert Steven Gygi, conducted proteomic 
and genomic studies comparing beige, 
white, and brown adipocytes (Kazak 
et al., 2015). KEGG pathway analysis of 
proteins preferentially expressed in beige 
versus brown fat revealed several compo- 
nents of the arginine/creatine and proline 
metabolism pathways. These findings 



were confirmed when the analysis was 
limited to proteins specifically enriched 
in purified mitochondrial fractions. Pro- 
teins that promote both creatine synthesis 
and phosphorylation, including the mito- 
chondrial creatine kinase CMKT2 and 
the majority of ATP synthase subunits, 
were elevated in mitochondria from beige 
fat. Creatine kinase (CK) activity was also 
specifically induced in beige fat mito- 
chondria derived from mice exposed to 
cold, suggesting that it is somehow under 
adrenergic control. Together, these find- 
ings hinted that a futile creatine phos- 
phorylation and dephosphorylation cycle 
might somehow be involved in generating 
heat specifically in mitochondria from 
beige adipocytes. 

CK catalyzes the phosphorylation of 
creatine using ATP, generating phospho- 
creatine and ADP. In tissues with high 
ATP demands, such as skeletal muscle, 
the high-energy phosphate bound to cre- 
atine can be transferred to ADP to 
generate cytosolic ATP (Wyss and Kad- 
durah-Daouk, 2000). If creatine were 
serving to regenerate mitochondrial ATP 
through classical CK-mediated phospho- 
transferase activity, it would be expected 
to boost respiration as a molar equivalent 
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The current obesity epidemic has focused a great deal of attention on mechanisms controlling 
energy balance. While diet and nutrient absorption affect energy intake, on the other side of the 
equation, energy expenditure is determined by basal metabolism, physical activity, and adaptive 
thermogenesis. Given various challenges in modulating these energy balance mechanisms to com- 
bat human obesity, many efforts have concentrated on how it might be possible to achieve weight 
loss through increased thermogenesis. In this issue of Cell, Kazak et al. describe a previously un- 
recognized molecular pathway for thermogenesis in fat cells. 



Non-shivering thermogenesis occurs pri- 
marily in brown adipose tissue in rodents, 
but also has been detected in so called 
“beige” adipocytes, thought to reside 
mainly in subcutaneous fat tissue inter- 
spersed with classic white adipocytes 
(Wu et al., 2012; Young et al., 1984). 
Much of human thermogenic fat most 
closely resembles the rodent beige adi- 
pose tissue (Shinoda et al., 2015). Beige 
and brown adipocytes both express un- 
coupling protein 1 (Ucpi), which resides 
on the inner mitochondrial membrane. 
While the electron transport chain drives 
protons into the intermembrane space in 
mitochondria, creating a proton gradient 
across the inner membrane to drive the 
synthesis of ATP (Figure 1A), Ucpi cre- 
ates a pore through which protons 
disperse into the mitochondrial matrix, 
thereby generating heat and uncoupling 
ATP synthesis (Figure 1 B). Cold exposure 
or increased sympathetic activity stimu- 
lated by feeding activates thermogenesis 
through adrenergic activation of Ucp1 
expression (Ricquier et al., 1986; Scar- 
pace et al., 1997). 

Although Ucpi is well established as an 
important component of thermogenesis, 
investigators have long known that the 



transcriptional regulation of Ucp1 cannot 
fully explain thermic responses. For 
example, the thermic effect of feeding is 
far too rapid to be explained by a tran- 
scriptional effect alone (Scarpace et al., 
1997). Furthermore, Ucp1 knockout 
mice can adapt to chronic cold exposure 
when the temperature transition is 
gradual (Golozoubova et al., 2001). Non- 
shivering thermogenesis has also been 
characterized in muscle, where Sarcolipin 
(Sin) uncouples ATP hydrolysis from Ca^"^ 
transport, thereby creating a futile cycle 
that generates heat (Bal et al., 2012). 
Flowever, Ucp1/Sln double-knockout 
mice still retain the ability to maintain ther- 
mal regulation when slowly adapted to the 
cold (Rowland et al., 2015), leaving a gap 
in our understanding of how thermogene- 
sis occurs. 

To fill in this gap, Bruce Spiegelman and 
his colleagues, including mass spec 
expert Steven Gygi, conducted proteomic 
and genomic studies comparing beige, 
white, and brown adipocytes (Kazak 
et al., 2015). KEGG pathway analysis of 
proteins preferentially expressed in beige 
versus brown fat revealed several compo- 
nents of the arginine/creatine and proline 
metabolism pathways. These findings 



were confirmed when the analysis was 
limited to proteins specifically enriched 
in purified mitochondrial fractions. Pro- 
teins that promote both creatine synthesis 
and phosphorylation, including the mito- 
chondrial creatine kinase CMKT2 and 
the majority of ATP synthase subunits, 
were elevated in mitochondria from beige 
fat. Creatine kinase (CK) activity was also 
specifically induced in beige fat mito- 
chondria derived from mice exposed to 
cold, suggesting that it is somehow under 
adrenergic control. Together, these find- 
ings hinted that a futile creatine phos- 
phorylation and dephosphorylation cycle 
might somehow be involved in generating 
heat specifically in mitochondria from 
beige adipocytes. 

CK catalyzes the phosphorylation of 
creatine using ATP, generating phospho- 
creatine and ADP. In tissues with high 
ATP demands, such as skeletal muscle, 
the high-energy phosphate bound to cre- 
atine can be transferred to ADP to 
generate cytosolic ATP (Wyss and Kad- 
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through classical CK-mediated phospho- 
transferase activity, it would be expected 
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Figure 1. Different Modes of Mitochondrial Respiration 

(A) Coupled respiration, which generates ATP. 

(B) Thermogenesis through uncoupled respiration by Ucp1 , which does not involve ATP synthase. 

(C) Thermogenesis by creatine futile cycle, which requires ATP synthase activity, although no net ATP is generated. 



to ADP, when ADP concentrations are 
limiting. Consistent with this classical 
function of creatine, addition of 0.01 mM 
creatine in the presence of 1 mM ADP to 
mitochondria isolated from classic brown 
fat and muscle had no detectable effect 
on respiration. However, in beige fat mito- 
chondria, this small amount of creatine 
produced a large effect on respiratory 
rate, far exceeding that expected from 
1:1 stoichiometry with ADP, suggesting 
that creatine is regenerated from phos- 
phocreatine via a futile cycle that dissi- 
pates the energy as heat (Figure 1C). 
This idea was supported by direct calo- 
rimetry, which demonstrated that addition 
of small amounts of creatine increased 
heat production in beige, but not brown, 
mitochondria. In contrast to Ucpi -medi- 
ated thermogenesis, this futile creatine 
cycle requires coupled ATP synthesis, 
although no net ATP is generated. 

The identification of this futile cycle may 
advance our understanding of beige-fat- 
specific thermogenesis in adult humans 
who possess little if any BAT. However, 
numerous questions remain about the 
molecular mechanics of this futile creatine 
cycle. Notable among these is the mech- 
anism of dephosphorylation. While Kazak 
et al. note that the mitochondrial phos- 
phatase Phosphol exhibits an expression 
pattern that suggests its participation in 
this cycle, this phosphatase did not cata- 
lyze dephosphorylation of phospho-crea- 
tine in vitro. The authors propose that 
Phosphol may play a unique role at the 



end of a phosphotransfer chain, but other 
players are probably involved, and it will 
be important to identify the relevant phos- 
phatase or transferase(s) that complete 
this futile cycle. 

An equally important question is how 
the transport and flux of creatine in mito- 
chondria affects the activity of this futile 
cycle. In principle, even diminishingly 
small quantities of creatine could continu- 
ally undergo phosphorylation and de- 
phosphorylation, obviating the need for 
significant creatine synthesis in beige ad- 
ipose tissue. Indeed, creatine levels are 
an order of magnitude higher in brown 
fat, where this futile cycle does not appear 
to be active. Along the same lines, it is not 
clear why the creatine transport inhibitor, 
p-GPA, which reduces creatine levels by 
less than 50%, would have such a pro- 
found effect on beige fat thermogenesis, 
as it reduced oxygen consumption in 
response to p-adrenergic stimulation in 
beige fat, as well as core body tempera- 
ture of cold-adapted Ucp1 knockout 
mice. Finally, a key question concerns 
how the cycle may be regulated, particu- 
larly in response to adrenergic activation 
of the beige fat cell. 

Additional investigations into this futile 
cycle by genetic and pharmacological 
manipulation of its activity will hopefully 
reveal its relative contribution to energy 
expenditure in humans and whether or 
not it is modulated in obesity. If it does 
prove to be an important component of 
adaptive thermogenesis, therapeutic or 



even dietary agents might be employed 
to activate the process and perhaps 
achieve weight loss in obese individuals. 
Let’s hope that efforts to decipher the 
mechanisms of this cycle are not futile. 
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To investigate the fundamental question of how nervous systems encode, organize, and sequence 
behaviors, Kato et al. imaged neural activity with cellular resolution across the brain of the worm 
Caenorhabditis elegans. Locomotion behavior seems to be continuously represented by cyclical 
patterns of distributed neural activity that are present even in immobilized animals. 



The worm Caenorhabditis eiegans might 
seem simple, but there’s a lot going on 
both inside and out. It explores the 
world with a series of sinusoidal swims 
punctuated by pirouettes— quick rever- 
sals or sharp turns. Poking the animal 
reliably evokes a reflexive reversal (de 
Bono and Maricq, 2005). Other stimuli, 
like chemicals, can elicit more stochas- 
tic changes, for example, when they 
suggest the presence of food. When 
worms detect decreases in the amount 
of food nearby, the probability they 
will pirouette increases (de Bono and 
Maricq, 2005). This behavior program 
helps worms navigate toward areas 
with high food concentration in a 
non-deterministic manner that’s difficult 
for predators to exploit. How does 
C. eiegans produce variable behavior 
sequences? How does it integrate 
diverse sensory information to adjust 
the probabilities of its behaviors? Un- 
derstanding how a nervous system en- 
codes, organizes, and sequences be- 
haviors is a central problem in systems 
neuroscience— whether in worms or hu- 
mans. In this issue of Ceii, Kato et al. 
(2015) shed new light on how neural dy- 
namics in the worm produce behavior. 
By combining several new technologies, 
including near-whole-brain imaging of 
neural activity at single-cell resolution, 
Kato et al. (2015) provide evidence 
that motor commands in C. eiegans 
are represented across large popula- 
tions of neurons with cyclical dynamics, 
building on and extending related ob- 
servations across a variety of behaviors 
such as digestion (Marder and Calabr- 
ese, 1996) and decision-making and 
across organisms ranging from leeches 
(Briggman et al., 2005) to primates 
(Churchland et al., 2012). 



Kato et al. (2015) imaged neural activity 
across ~100 neurons in the brains of 
worms held immobile in a small channel. 
They used GCAMP, a genetically en- 
coded calcium indicator that elicits green 
fluorescence in active neurons, and a fast, 
commercially available, confocal micro- 
scope to capture volumetric images of 
the entire brain every third of a second. 
More sophisticated microscopes have 
been used to image neural activity in 
larger volumes at higher speeds (Prevedel 
et al., 2014), but Kato et al. (2015)’s sys- 
tem achieved single-neuron resolution 
across most of the brain and, when 
combined with the extensive existing 
knowledge of C. eiegans neural anatomy, 
supported identification of most neurons. 

Despite being held immobile, the 
worms’ brains fluttered with activity over 
long durations (~20 min per worm). The 
resulting data— 100-dimensional time se- 
ries describing neural activity, one dimen- 
sion for each cell— are difficult to under- 
stand with the naked eye and require 
higher-level analysis. Kato et al. (2015) 
employed principal component analysis 
(PCA) to reduce the high-dimensional 
data to two- or three-dimensional trajec- 
tories (Figure 1A). These dynamical por- 
traits captured the structure of a neuronal 
population in a simpler and more inter- 
pretable form, revealing trajectories of 
neuronal activity that followed a cyclical, 
highly repeatable pattern. That is, one ste- 
reotyped pattern of neural activity is fol- 
lowed by a second stereotyped pattern 
and so on, until the original pattern occurs 
again and the cycle repeats. In addition, 
Kato et al. (2015) observed that the activ- 
ity of many neurons contributed to this 
neural representation, suggesting that 
these neural dynamics, though restricted 
to just a couple of dimensions, were 



distributed across a large number of neu- 
rons. Cyclical neural dynamics have been 
observed in the generation of rhythmic 
motor behaviors like digestion and swim- 
ming (Marder and Calabrese, 1996), as 
well as non-periodic behaviors like reach- 
ing for a target (Churchland et al., 2012). 
Low-dimensional, distributed neural rep- 
resentations also have been observed 
in many systems, including locomotion 
behavior choice (Briggman et al., 2005) 
and odor-identity encoding (Stopfer 
et al., 2003). Both oscillatory and distrib- 
uted neural representations are hypothe- 
sized to be fundamental neural organiza- 
tional strategies (Briggman and Kristan, 
2008), about which numerous open ques- 
tions remain. How does neural activity 
state relate to behavior? How are cyclical 
patterns generated and how do the many 
neurons in this network coordinate their 
activity? What are the advantages of 
these implementations? How are sensory 
information, learning, and the animal’s 
history integrated to change the neural 
representation? 

Through a combination of analysis and 
experiment— many exploiting the unique 
experimental capabilities in C. eiegans— 
Kato et al. (2015) tried to demystify these 
neural trajectories with more concrete ob- 
servations. They performed a second set 
of experiments in which they imaged neu- 
ral activity in freely behaving worms to 
observe locomotion concomitant with 
neural activity. In these more limited re- 
cordings, they used genetic techniques 
to target calcium indicator expression to 
cells identified as important by the PCA 
analysis of whole-brain data. Surprisingly, 
clusters of the neural trajectory space, 
defined solely on the basis of neural activ- 
ity, corresponded to different locomotion 
behaviors: swimming forward, reversing. 



CrossMark 



Cell 163, October 22, 2015 ©2015 Elsevier Inc. 541 




Cell 




Figure 1 . Neural Representation of Locomotion Behavior 

(A) The trajectory of neural activity obtained by near-whole-brain-imaging at single-cell resolution in 
C. elegans projected onto the first two principal components. The dynamics are cyclical and stereotyped. 
Color indicates which cluster each segment of neural activity belongs to. Adapted from Kato et al. (2015). 

(B) Regions of neural activity space annotated with the behaviors they correspond to. Adapted from Kato 
et al. (2015). 

(C) State transition diagram describing C. elegans locomotion behavior paralleled by the neural dynamics 
shown in (B). 



and ventral and dorsal turns (Figure 1B). 
Thus, it appears that a large portion of 
the C. elegans neural activity, even in im- 
mobilized animals, encodes the (fictive) 
locomotor state. The neural activity flow 
(Figure 1 B) has similarities to a continuous 
version of a behavior state transition dia- 
gram (Figure 1C), a representation often 
used to visualize the types and probabili- 
ties of behavior transitions observed (An- 
derson and Perona, 2014). 

A surprising feature of these patterns 
of neural activity is that they are largely 
self-generated. Eliminating activity in 
an output motor command neuron left 
much of the cyclical activity patterns 
intact, and environmental input also 
seemed to have limited influence on the 



shape of the neural activity manifold. 
Instead, increasing oxygen concentration 
increased the frequency with which activ- 
ity entered regions of neural space asso- 
ciated with reversals. Taken together, 
these observations suggest that a large 
fraction of the brain of the worm is 
constantly oscillating between states 
which, when the animal is freely behaving, 
cause it to perform different locomotion 
behaviors. In this framework, sensory in- 
formation modulates the probabilities of 
entering and leaving these states. Thus, 
these neural dynamics may explain the 
variability exhibited in the worm’s loco- 
motion behavior (Gordus et al., 2015). 

Kato et al. (2015) have opened up 
whole-brain imaging in C. elegans as a 



new, powerful system for investigating 
how the brain sequences behavior, as 
well as how cyclical and distributed neural 
representations are generated and main- 
tained. Although behaviorally simpler 
than animals like flies, mice, or primates, 
C. elegans has experimental advantages, 
many demonstrated here: transparency, 
developmental stereotypy, thoroughly 
characterized anatomy, and genetic con- 
trol. In future research, this system might 
be enhanced if whole-brain imaging 
were possible in freely behaving animals, 
making the neural-behavioral relation- 
ships explored here more direct. It would 
also be aided by development of faster 
indicators of neural activity, other fluores- 
cent sensors, and microscopes with 
subcellular resolution. New analytical 
methods for relating neural trajectories 
to behavior might also result in new dis- 
coveries and will become increasingly 
necessary as these techniques are 
extended to more complex animals like 
zebrafish, fruit flies, mice, and, eventually, 
primates and humans. 
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Ethylene regulates many aspects of plant growth and development. In the presence of ethylene, 
the C terminus of EIN2 (EIN2C) translocates into the nucleus and activates transcription. Li et al. 
and Merchante et al. show that EIN2C also regulates translation through an interaction with the 
3' UTRs of transcripts. 



Ethylene is a gaseous plant hormone 
known to affect diverse aspects of plant 
growth and development, including leaf 
abscission, germination, leaf epinasty, 
senescence, and fruit ripening, as well 
as biotic and abiotic stress responses. 
The ethylene-signaling pathway in plants 
is well understood. Ethylene is perceived 
by the ETR1/ETR2/ERS1/ERS2/EIN4 re- 
ceptors on the endoplasmic reticulum 
(ER) membrane. In the absence of 
ethylene, the receptors activate CTR1 , a 
Ser/Thr kinase that suppresses the 
ethylene response. This is accomplished 
by phosphorylation of another ER mem- 
brane protein EIN2, a critical component 
of the ethylene signaling pathway (Alonso 
et al., 1999) (Figure 1 A). In the presence of 
ethylene, EIN2 is no longer phosphory- 
lated, and its C terminus (EIN2C) is 
cleaved by unknown proteases and trans- 
located into the nucleus where it activates 
the master transcriptional regulators EIN3 
and EIL1 (Qiao et al., 2012) (Figure IB). 

Not only the activity of EIN3 is modu- 
lated by ethylene signaling via the “cleave 
and shuttle” of EIN2 — its abundance is 
also subjected to regulation by ethylene. 
EIN3 is a short-lived protein that is 
degraded by the ubiquitin proteasome 
system in the absence of ethylene, a 
process that is mediated by two ubiquitin 
E3 ligases containing the F-box pro- 
teins EBF1 and EBF2 (An et al., 2010) 
(Figure 1A). In the presence of ethylene, 
EIN2C promotes the degradation of 
EBF1/2 and EIN3 accumulates in the nu- 
cleus (Figure IB). 

Although EIN3/EIL1 -dependent tran- 
scriptional regulation constitutes a major 
fraction of the ethylene response, the dis- 
covery that some rapid ethylene growth 
responses are EIN2 dependent but don’t 



require EIN3/EIL1 led to speculation that 
the pathway branched after EIN2 (Binder 
et al., 2004). In this issue of Cell, two pa- 
pers by Li et al. (2015) and Merchante 
et al. (2015) may have provided the mo- 
lecular basis for this second pathway in 
Arabldopsis. 

Merchante et al. (201 5) start their study 
by asking if ethylene has any effect on the 
translation of specific genes. They use 
genome-wide ribosomal footprinting and 
RNA-seq to identify genes that are trans- 
lationally regulated by ethylene. Interest- 
ingly, EBF1I2 are among the genes, and 
their translation is downregulated by 
ethylene (Merchante et al., 2015). Li 
et al. (201 5) come to the same conclusion 
by a different approach. They follow up on 
an earlier observation that mRNA frag- 
ments containing the 3' UTR of EBF1/2 
accumulate in an ethylene-insensitive 
mutant and find that overexpression of 
this 3' UTR results in the unresponsive- 
ness to ethylene stimulation (Olmedo 
et al., 2006). This effect is due to 
increased translation of the EBF1I2 
mRNA in the presence of excess 3' UTR 
and a subsequent decrease in EIN3 
levels, suggesting that the 3' UTR of 
EBF1I2 is involved in ethylene-mediated 
translational control of EBF1/2. 

Protein translational control has several 
advantages. The response to a signal can 
be very rapid, and because the mRNA 
template is not destroyed, regulation is 
easily reversible. The process is often 
mediated by the binding of regulatory pro- 
teins to the 5' or 3' UTR. Moreover, micro- 
RNAs can also bind to the 3' UTR of the 
target RNA to control either mRNA decay 
or translation of the target protein (Jia 
et al., 201 3, Mayr and Bartel, 2009). Often, 
mRNAs that are not translated aggregate 



into cytoplasmic mRNP granules, known 
as P-bodies and stress granules, where 
they may be degraded. 

In the case o\EBFM2, both Li et al. (201 5) 
and Merchante et al. (2015) show that 
the decreased translation of EBF1/2 by 
ethylene signaling is dependent on EIN2 
but independent of EIN3/EIL1. Following 
up on this genetic evidence, the authors 
further demonstrate that EIN2C binds to 
the EBF1/2 3' UTRs, either directly or 
indirectly, to regulate translation of the 
respective proteins (Figure IB). Finally, 
the authors show that EIN2C, the EBF1 
3' UTR, and EIN5 all localize in cyto- 
plasmic P-bodies upon ethylene stimula- 
tion (Figure 1 B). Thus, it is possible that in- 
hibition of translation is due to recruitment 
of the RNAs to the P-bodies. 

In the paper by Merchante et al. (2015), 
these studies of translational regulation 
converge with a second project, a genetic 
screen to identify new genes that are 
required for ethylene response. Among 
the genes recovered in this screen are 
three members of the nonsense-mediated 
RNA decay machinery, UPF1, UPF2, 
and UPF3. Since the UPFs are known to 
inhibit translation, the effects of the upf 
mutants on EBF1I2 translation were tested 
(Merchante et al., 2015). Indeed, the upf2- 
10 mutation clearly reduces ethylene- 
dependent translational regulation of 
EBF1I2 mRNAs. The authors also provide 
evidence that UPF2 co-localizes with 
EIN2C in the P-bodies and propose that 
the UPF proteins may facilitate the interac- 
tion between EIN2C and the 3'UTR of 
EBF1/2 mRNAs. 

These studies shed light on the function 
of the enigmatic EIN2 protein and deepen 
our understanding of the ethylene-sig- 
naling pathway. There are a number of 
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Figure 1. Transcriptional and Translational Regulation in the Ethylene Signaling Pathway 

(A) In the absence of ethylene, ethylene receptors (ETR1/ETR2/ERS1/ERS2/EIN4) on the ER membrane activate CTR1 , a Ser/Thr kinase, which phosphorylates 
another ER membrane protein EIN2C. The F-Box proteins EBF1 and EBF2 bind and degrade the master transcription factors EIN3 and EIL1 via the ubiquitin 
proteasome system, preventing the ethylene-stimulated transcription cascade. 

(B) In the presence of ethylene, receptors perceive ethylene at the ER membrane and no longer activate CTR1 . As a result, unphosphorylated EIN2 is cleaved by 
an unknown mechanism, and the cleavage product, the C terminus of EIN2 (EIN2C), shuttles to the nucleus, where it activates the master transcription factors 
EIN3/EIL1 and the downstream transcription cascade. Concurrently, cytoplasmic EIN2C directly or indirectly binds to the 3' UTR of EBF1 and EBF2 mRNAs, 
which inhibits their translation. 



interesting questions that remain to be 
answered. The precise mechanism of 
how EBF1/2 translational regulation is 
achieved by EIN2C is yet to be described. 
How EIN2C specifically recognizes the 3' 
UTR of EBF1I2 transcripts, what features 
of the UTR are important, and if and how 
additional factors are involved in the 
binding and subsequent translational 
repression warrant further investigation. 
It is possible that recruitment of the 
EBF1I2 to the P-body is itself sufficient 
to inhibit translation (Maldonado-Bonilla, 
2014). Alternatively, EIN2C, or another 
interacting protein such as UPF2 may 
directly inhibit translation, perhaps by 
preventing the formation of the 43S initia- 
tion complex. Another important ques- 
tion is that of the fate of EBF1/2 RNAs 
once they reach the P-body. Are they 
degraded or can they be released and 
subsequently translated anew? Finally, 
it is still not clear what factors down- 
stream of EIN2 mediate the rapid ethylene 



response. Although the effect of ethylene 
on EBF1/2 translation reported in these 
studies is independent of EIN3/EIL1, 
because EBF1/2 regulate the EIN3 protein 
level, the outcome of EBF1/2 translational 
regulation still requires EIN3. Presum- 
ably, other targets of EIN2-dependent 
translational regulation are responsible 
for the rapid ethylene response observed 
previously (Binder et al., 2004). In any 
case, an enhanced understanding of 
ethylene signaling may have important 
practical benefits. Many aspects of 
plant physiology and development are 
mediated by ethylene, and the ability to 
manipulate the pathway in crops is likely 
to lead to important improvements in 
crop yield. 
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In this Minireview, we discuss basic aspects of germinal center biology in the context of immunity to 
influenza infection and speculate on how the simultaneous evolutionary races of virus and antibody 
may impact our efforts to design a universal influenza vaccine. 



Introduction 

Influenza epidemics cause millions of infections and hundreds of 
thousands of deaths worldwide each year and cost nearly $100 
billion per year in the United States alone. The influenza vaccine 
is generally protective against the strains from which it is 
composed. However, effectiveness wanes as herd immunity 
pushes the viral envelope proteins to mutate and evolve (anti- 
genic drift). Periodically, more antigenically distinct or virulent 
influenza strains arise due to recombination among zoonotic 
strains (antigenic shift). These strains can cause pandemics 
such as the 1918 Spanish flu, which had a death toll of tens of 
millions of people. 

The primary target of anti-influenza antibodies is the hemagglu- 
tinin (HA) protein, a trimer consisting of a membrane (envelope)- 
embedded stalk region and an expanded globular head on which 
the receptor-binding site (RBS) is located. Most protective anti- 
bodies against HA bind to regions surrounding the RBS that are 
highly mutable, which allows antigenic drift and immune escape. 
However, rare antibodies have been isolated that bind function- 
ally critical regions of HA that are much less susceptible to anti- 
genic drift (Krammer and Palese, 2015; Schmidt et al., 2015). 
These antibodies bind either within the RBS, mimicking the sialic 
acid ligands of HA, or to regions of the HA stalk that are critical for 
viral fusion to host cell membranes (Figure 1A). A major goal of 
vaccinologists is to develop a universal vaccine capable of elicit- 
ing protective antibodies to epitopes that are common among 
influenza strains and that are stable over time, thus circumventing 
antigenic variation (Krammer and Palese, 2015). 

Antibodies attain high affinity through somatic hypermutation 
(SHM) of immunoglobulin (Ig) genes in B cells following exposure 
to antigen in a process known as affinity maturation (Eisen, 
2014). Most antibodies to influenza cloned from humans are 
heavily mutated, and these mutations are likely critical for 
broadly protective binding to the virus (Lingwood et al., 2012; 
Pappas et al., 2014; Schmidt et al., 2015). Affinity maturation 
takes place in germinal centers (GCs) (Victora and Nussenzweig, 
2012), where B cells undergo SHM and are subsequently 
selected based on the ability of their mutant Igs to bind antigen. 
A fundamental constraint to this process is that GCs select for 
antibodies with higher affinity for antigen (or some close corre- 



late of it) but are “agnostic” when it comes to their protective ef- 
ficacy-including their ability to neutralize virus or kill infected 
cells and their potential to cross-react with other strains of the of- 
fending pathogen (breadth). For many infectious diseases, this 
“evolution by proxy” is sufficient to provide robust immunity. 
However, in cases like influenza, antigenic drift renders high-af- 
finity protective antibodies from one season ineffective against 
newly emerging strains. 

GC Kinetics and Structure 

Antigenic stimulation triggers specific B and T cells to move to- 
ward the T zone/follicle (T:B) border area of secondary lymphoid 
organs. There, B cells that present antigen-derived peptides to 
helper T cells become “authorized” to engage in a productive 
immune response. Successful B cells enter one of three devel- 
opmental paths: they can differentiate into plasma cells (PCs) 
that secrete early, low-affinity antibody; they can re-establish a 
nonproliferative state and join the memory B cell pool; or they 
can enter the GC reaction (Figure 1 B) (Victora and Nussenzweig, 
2012 ). 

GCs appear several days after antigen exposure as clusters of 
rapidly proliferating cells in the center of B cell follicles. GCs 
comprise two anatomically defined areas: the dark zone (DZ), 
where cells proliferate and hypermutate their Ig genes, and the 
light zone (LZ), where antigen-driven selection takes place (Vic- 
tora and Nussenzweig, 2012). Following DZ hypermutation, B 
cells migrate to the LZ, where antigen is deposited as immune 
complexes on the surface of follicular dendritic cells (FDCs). LZ 
B cells compete to bind and retrieve antigen from FDCs and pre- 
sent it to GC-resident T follicular helper (Tfh) cells. B cells that 
have acquired higher affinity by virtue of SHM are more likely 
to receive positive selection signals, triggering their return to 
the DZ for further proliferation and hypermutation (cyclic re-en- 
try, Figure IB). GC selection is thus reminiscent of Darwinian 
evolution: iterative cycles of descent with modification (SHM) fol- 
lowed by fitness (affinity)- based selection lead to increased 
fitness of the population as a whole. Sporadic differentiation of 
positively selected LZ B cells into PCs and memory B cells re- 
sults in the progressive increase in the affinity of serum anti- 
bodies over time and upon re-immunization (Figure 1 B). 
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Figure 1 . The Germinal Center Response to 
Influenza 

(A) Residue conservation among seasonal H1 HA 
isolates (1975-2005). Conservation is shown for 
one monomer on a scale from blue (most 
conserved) to white (most variable). Red residues 
indicate common glycosylation sites. The RBS is 
marked with a star. Image courtesy of Stephen C. 
Harrison. 

(B) Overview of affinity maturation in the GC. 
Cyclic migration of B cells between light and dark 
zones drives affinity maturation. Prior to GC entry 
and upon positive selection, B cells can differen- 
tiate into the PC or memory fates. 

(C) Proposed model for re-establishment of im- 
munodominance to strain-specific epitopes. Top: 
exposure to Strain 1 of influenza generates a 
response mostly focused on immunodominant, 
Strain-1 -specific epitopes (red) but also induces 
a subdominant cross-reactive response (green). 
Middle: exposure to a divergent Strain 2 will 
initially reactivate cross-reactive memory cells 
(green) from the response to Strain 1 , generating 
a “broad” response, but will also prime 
Strain-2-specific clones de novo, which will 
eventually outcompete the cross-reactive clones. 
Bottom: re-exposure to Strain 2 will preferentially 
recall Strain-2-specific clones, reinstating im- 
munodominance. 



tion of the same expanded clone in GCs. 
Also under scrutiny is the relative propen- 
sity of memory B cells to re-enter GCs for 
further diversification, rather than exclu- 
sively differentiating into secondary PC 
(McHeyzer-Williams et al., 2015). The 
ability to sequentially diversify the same 
clone over multiple responses is likely to 
be crucial to eliciting a broad response 
to influenza. 

Selection of High-Affinity Mutants 
in the GC 

Two models for how affinity- based selec- 
tion operates in GCs are traditionally pro- 
posed. The first, and simplest, centers on 
antigen-driven signaling through the B 
cell receptor (BCR, comprising surface 
Ig, Iga, and IgP) as the direct driver of 
selection. In this model, Ig with highest 
affinity for antigen will bind more strongly 
to immune complexes deposited on LZ 
FDCs, which triggers their return to the 
DZ and further proliferation (Victora and 
Nussenzweig, 2012). A recent develop- 
ment is the debate over whether BCR 



The cues that trigger B cells to choose between cyclic re-entry 
and differentiation into PCs or memory B cells are unknown. High 
affinity for antigen appears to be a pre-requisite for PC differen- 
tiation and/or survival (Goodnow et al., 201 0). However, because 
PC differentiation occurs after clonal expansion, diversion of part 
of a clone into the PC fate does not preclude further diversifica- 



signaling is even active in GC B cells undergoing selection in 
the l_Z and the role that inhibition of BCR signaling by Fc recep- 
tors might play in this process (Espeli et al., 2012; Khalil et al., 
2012 ). 

The second model of selection proposes that, rather than 
competing for direct signals from antigen, GC B cells compete 
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for limiting amounts of Tfh cell help. Here, the primary role of 
the BCR in selection is to trigger endocytosis: B cells acquire 
and present antigen in proportion to the affinity of their Ig. This 
maps Ig affinity— a B cell intrinsic property— onto surface pep- 
tide-MHC (pMHC) density— a feature that can be distinguished 
by Tfh cells. Several aspects of this model have been validated 
experimentally, and forcing interaction of GC B cells with Tfh is 
the only experimental approach so far that has been successful 
in triggering positive selection of GC B cells in vivo (Victora and 
Nussenzweig, 2012). 

These two models are closely interrelated and thus not mutu- 
ally exclusive. For example, strong inhibition by Fc receptors 
could serve to blunt BCR signaling so that B cells rely more 
heavily on T cells for selection. On the other hand, signals from 
T cells could potentially relieve Fc-mediated repression, allowing 
for productive BCR signals only in selected cells. 

Establishing the mechanism of GC-positive selection can have 
important consequences to our understanding of how broadly 
protective antibodies develop. A recent report by Wang et al. 
(2015) has shown that levels of antibodies with sialylated Fes in 
trivalent influenza vaccine (TIV) recipients 7 days after vaccination 
predicted the affinity of the anti-HA response 2 weeks later. Vacci- 
nation of mice with sialylated (versus non-sialylated) HA immune 
complexes generated antibodies capable of heterosubtypic 
protection in an in vivo challenge model. The authors traced this 
effect back to the upregulation of inhibitory Fc receptor FcyRIIB 
in GC B cells by sialylated Fes, which would increase the 
threshold for BCR-driven selection, altering the affinity and/or 
specificity of the ensuing response. On the other hand, T cell prim- 
ing with plasmid DMA encoding HI prior to immunization with 
seasonal vaccine also increased the frequency of cross-reactive 
antibodies directed to the HA stem (Wei et al., 2010). While the 
mechanistic basis for this subversion of immunodominance is 
not clear, a likely scenario is that increased CD4 T cell priming 
may have led to relaxed interclonal competition between B cells 
before and within GCs because T cell help became less limiting. 

Clonal Diversity and Immunodominance in the Antibody 
Response 

The naive B cell repertoire comprises a large number of distinct 
V(D)J rearrangements, each expressed by only one or a few cells 
that proliferate to form clones upon antigenic exposure. B cells 
with low or even undetectable affinity for HA are capable of being 
recruited into GCs (Lingwood et al., 2012). 

Competition between B cell clones (/nferclonal competition) 
somewhat limits the access of lower-affinity B cells to the early 
GC. This clonality is further restricted in mature GCs by com- 
bined competition between clones and among SHM variants of 
the same clone (intrac\ona\ competition) (Eisen, 2014; Victora 
and Nussenzweig, 2012). Competition is thought to lead to pro- 
gressive loss of clonal diversity in the responding population. 
Thus, only a fraction of B cells remains in the immune response 
long enough to acquire the somatic mutations required to confer 
high affinity, leading to immunodominance. The immune system 
must therefore tune competition to the right level to balance af- 
finity and diversity— if too stringent, average population affinity 
will increase fast but at the expense of diversity, and if too lax, 
diversity will remain high, but affinity will increase only slowly. 



Immunodominance appears to be a key factor in preventing 
the emergence of broadly neutralizing influenza antibodies. Anti- 
bodies against epitopes that are conserved between different 
HA variants, such as the HA stem or the RBS, are underrepre- 
sented when compared to antibodies to more variable regions 
on the HA globular head. Potential reasons for this are that anti- 
bodies that bind these conserved epitopes require particular 
amino acid sequence elements (Lee and Wilson, 2015; Schmidt 
et al., 2015) and that conserved regions represent a relatively 
small or inaccessible portion of the HA surface (Figure 1A). 
Conversely, epitopes in the more variable regions of HA that 
are permissive to antigenic drift are more abundant, more acces- 
sible on the intact virion, and can be targeted in a multitude of 
ways. Evolutionary pressure on the virus may have led to the 
development of these variable but immunodominant epitopes 
as decoys, thus protecting conserved sites. 

When exposed to a novel influenza strain for the first time, 
conserved epitopes are the only ones to which memory B cells 
exist. Thus, novel influenza strains can activate memory B cells 
that are cross-reactive to conserved epitopes, even predomi- 
nantly generating a broad response (Wrammert et al., 2011; 
Ellebedy et al., 2014). However, re-exposure to a novel strain 
will shift the response predominantly toward antibodies on the 
globular head, reinstating its immunodominance (Ellebedy 
et al., 2014). We propose that such immunodominance is due 
in large part to GC (and potentially pre-GC) selection steering 
the antibody response away from conserved but subdominant 
epitopes toward more immunodominant ones (Figure 1C). Two 
factors that could contribute to re-establishing immunodomi- 
nance are a greater potential of epitopes on the variable portion 
of HA to drive affinity maturation and incomplete conservation of 
cross-reactive epitopes between variant influenza strains. Thus, 
cross-reactive memory B cells may have sufficient affinity to 
become PCs and re-enter GCs upon exposure to a novel strain 
but could nonetheless be outcompeted in the course of the 
response by primary B cell clones undergoing de novo affinity 
maturation toward drifted but more immunodominant epitopes 
(Figure 1C). 

The importance of GC selection for immunodominance is illus- 
trated by a recent experiment in mice. Repeated administration 
of a low dose of the immune-suppressant rapamycin during 
influenza immunization abolished the GC response, which sur- 
prisingly was followed by increased resistance to heterosubtypic 
challenge and a change in the HA epitopes targeted by the 
resulting antibodies (Keating et al., 2013). The mechanistic rea- 
sons for this shift are unclear but are likely related to relaxed 
competition in the absence of a GC response. This observation 
suggests that immunodominance of certain regions of HA 
over others is not set in stone and can potentially be over- 
come by optimizing vaccination strategies to skew interclonal 
competition. 

Approaches for Vaccination 

Universal vaccination to influenza would require an antibody 
response that not only neutralizes all existent strains but also 
from which no variant can escape by mutation. Epidemiological 
evidence suggests that responses of such type can be elicited. 
For example, the broadly protective responses of humans to 
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the 2009 pandemic H1N1 strain may have caused the eradica- 
tion of the previous H1N1 lineage that had infected humans for 
91 years since the 1918 pandemic but no longer circulates 
(Krammer and Palese, 2015). Epidemiological studies aimed at 
identifying individuals who are completely immune to one or 
more influenza subtypes may help determine the required fea- 
tures of a universally protective immune response in a manner 
similar to what has been achieved in recent years by studying 
HIV-infected individuals (Klein et al., 2013). 

A recent study by Schmidt et al. (2015) provides a glimpse of 
what a universally protective response might look like. In one in- 
dividual, a series of clonally unrelated antibodies were found that 
bind to the conserved RBS pocket from different angles. In this 
case, mutations in the rim of the pocket, which normally render 
RBS antibodies ineffective against antigenic drift, are effective 
in preventing neutralization by only one or a few of these anti- 
bodies, but never all of them. Thus, perhaps a “team” of neutral- 
izing antibodies may be able perform a function that would be 
impossible for any single bNAb. 

Several studies in the literature have suggested strategies 
to elicit broadly neutralizing responses (Krammer and Palese, 
201 5). These follow along two broad lines: the first aims at devel- 
oping immunization with a variety of natural or engineered 
antigens designed to force the immune system to focus on 
cross-reactive epitopes. These approaches include simulta- 
neous or sequential immunization with different natural HA pro- 
teins, truncated (e.g., stem only) or chimeric (e.g., conserved 
stem, “exotic” head) HA variants, and/or viruses of varied sub- 
type. The rationale is to attempt to overcome immunodominance 
by either eliminating strain-specific immunodominant epitopes 
or providing a competitive advantage to B cell clones that recog- 
nize epitopes common to multiple divergent HAs. Two recent 
reports highlight the promise of such strategies for inducing 
cross-reactive antibodies in multiple animal models (Impagliazzo 
et al., 2015; Yassine et al., 2015). 

The second set of approaches relies on using standard anti- 
gens while manipulating the rules of selection in the antibody 
response. Some of these were discussed above (immunization 
with immune complexes, rapamycin treatment, and DMA prim- 
ing). Strategies based on increasing Tfh help have been particu- 
larly of interest, given the great emphasis on Tfh cells as the 
“judges” of GC selection (Victora and Nussenzweig, 2012). A 
question that remains unsolved is what effect changing Tfh 
numbers has on selection: while fewer Tfh cells may promote 
stronger competition and therefore maximize the rate of affinity 
maturation, more Tfh may be desirable to maximize the size, 
quantity, and duration of GCs and perhaps allow for the appear- 
ance and maintenance of subdominant B cell clones. Evidence 
that adjuvants such as MF59 can expand the breadth of epitopes 
targeted by humans to include more of the conserved epitopes 
provides proof-of-principle evidence that manipulating the 
immune response can lead to increased clonal diversity (Del Giu- 
dice and Rappuoli, 2015). 

Once a broadly protective response can be achieved by vacci- 
nation, a final issue that will need to be addressed is the longevity 
of the broadly protective response. That is, can an established 
broadly protective response resist challenge by immunodomi- 



nant responses to drifting or non-protective epitopes eventually? 
Further understanding of the basic biology of recall responses 
and maintenance of long-lived PC will be required to address 
these issues. 
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Adaptation is the process in which organisms improve their fitness by changing their phenotype us- 
ing genetic or non-genetic mechanisms. The adaptation toolbox consists of varied molecular and 
genetic means that we posit span an almost continuous “adaptation spectrum.” Different adapta- 
tions are characterized by the time needed for organisms to attain them and by their duration. We 
suggest that organisms often adapt by progressing the adaptation spectrum, starting with rapidly 
attained physiological and epigenetic adaptations and culminating with slower long-lasting genetic 
ones. A tantalizing possibility is that earlier adaptations facilitate realization of later ones. 



When challenged by new conditions, organisms adapt by chang- 
ing their phenotype to improve fitness. The adaptation toolbox 
consists of varied molecular and genetic means: physiological 
acclimation, epigenetic changes, structural re-arrangements of 
the genome, and changes in the DNA sequence. Physiological 
responses, such as gene expression changes, are often the first 
to emerge upon environmental changes. Yet, although physio- 
logical adaptations may confer selective advantage, they are 
not actively amplified, memorized, or propagated over many 
generations. On the next level are epigenetic adaptations, which 
are distinct from the physiological adaptations, as they can have 
varying degrees of self-perpetuation over time, and they may 
occur at the DNA and chromatin, RNA, and even protein. As 
such, they constitute a molecular “memory.” A next level on 
the spectrum is that of DNA copy-number adaptations, which 
include segmental DNA duplications/deletions that may range 
from specific genes to whole chromosomes. These are relatively 
labile genetic changes, although they do not involve changes in 
the actual nucleotide sequence of the genome. Lastly, genomic 
mutations represent the ultimate level of adaptation in which 
specific changes are stored and inherited relatively faithfully for 
prolonged periods. The diverse adaptations along the spectrum 
differ by several important attributes: the time needed for the 
adaptation to be attained at the individual organism level, the 
time until the adaptation becomes frequent in the population, 
the duration through which the adaptation can be sustained 
beyond the presence of the external selective pressure and 
when it is relieved, and the faithfulness and accuracy at which 
the adaptation is propagated across generations (Figure 1). 
When a challenge persists for longer durations, early adaptations 
that have been obtained may be subsequently replaced by a 
more durable adaptation. Indeed, it is often observed that adap- 
tations at the various levels may facilitate one another, e.g., tran- 
scription changes can induce chromatin-based modifications 
(Henikoff and Shilatifard, 2011) and chromosome aneuploidy fa- 
cilitates mutations in the DNA (Sheltzer et al., 2011). Although 
adapting organisms need not necessarily move linearly and 
uni-directionally along this “adaptation spectrum,” the effective 



timescales of the different adaptations may dictate a tendency to 
move along the spectrum, from the short-lived physiological 
changes toward the long-lasting genetic ones. For example, a 
recent study on malaria discovered that P. fa/c/parum -acquired 
drug resistance is a step-wise adaptation process. A non-ge- 
netic adaptation to the drug precedes duplications and muta- 
tions of the gene that confers the drug resistance (Herman 
et al., 201 4). Along this line, we would like to hypothesize that or- 
ganisms can perform a “relay race” on the adaptation spectrum. 

Below, we discuss adaptations at each level of the spectrum — 
the context in which they occur and their typical timescales. 
Further we highlight cases of adaptation that may support the 
“relay race” notion, as some adaptions happen sequentially, 
each paving the way for the next to occur. 

Physiological Adaptations 

When stressed, organisms often acclimate quickly by a series of 
physiological responses. For example, in the yeast S. cerevisiae, 
a significant portion of the transcriptome changes in response to 
diverse environmental stressors such as extreme temperature, 
pH, salinity, and various drugs. These transcriptional plasticity 
responses are often temporary, as fast relaxation of the 
response is observed within minutes or hours, even as the stress 
prevails (Causton et al., 2001 ; Gasch et al., 2000; Shalem et al., 
2008). What is the nature of this response? Genes that are 
needed to cope with the stress are often induced, e.g., heat- 
shock chaperones and anti-oxidants enzymes, while genes 
needed to sustain growth under optimal conditions like ribo- 
somal genes are repressed. The durability of gene expression 
changes can vary and in some cases can persist across cellular 
generations. For example, when yeast cells are switched from 
glucose into galactose, they upregulate the galactose genes (Za- 
charioudakis et al., 2007). Yet, if switched back into glucose for 
one generation and then again into galactose, the expression 
response will be faster than at first encounter with galactose, 
suggesting a memory of the previous exposure to galactose. 
Even when grown for up to seven generations away from galac- 
tose, cells still “remember” the previous galactose experience. 
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Figure 1 . The Different Levels of the Adaptation Spectrum 

Two timescales characterize the different adaptation levels: the time needed to 
acquire the adaptation (left axis) and the time along which the adaptation can 
be maintained in the absence of the condition that originally required the 
adaptation. Physiological adaptations consist of changes in current 
biochemical homeostasis, and therefore their duration depends on the lifetime 
of those biomolecules that underlie the adaptation, like mRNAs, proteins, etc. 
Epigenetic modifications typically occur within the same generation that 
experience the trigger, yet their duration depends on the type of epigenetic 
mechanism, e.g., DNA methylation, chromatin modification, prions, etc. (re- 
viewed by Rando and Verstrepen, 2007). Genomic duplications include 
segmental duplications and aneuploidy as a result of chromosomal mis- 
segregation during cell cycle that result in crude changes in gene expression of 
the duplicated region. Genetic mutations are functional changes in the coding 
sequence or regulatory regions of genes that alter their function or expression. 

Where and how is this memory stored? It was previously 
believed that the memory is implemented by nuclear factors 
that determine rate of transcription re-activation upon re- 
encounter with galactose (Brickner et al., 2007; Kundu et al., 
2007). Yet, a later study clearly showed that the memory is 
stored in the cytoplasm, in the form of a signaling protein (Za- 
charioudakis et al., 2007; Ptashne, 2008). It is thus suggested 
that dilution of the protein in every cell division limits the dura- 
bility of this memory to seven generations. More recent work 
on similar “phenotypic memory” showed a memory of up to 
ten generations when E. coli cells were subjected to rapidly alter- 
nating carbon sources. This memory mechanism, termed 
“response memory,” appeared to be a hysteretic behavior in 
which gene expression persists after removal of its external 
inducer, and this enhances adaptation when environments fluc- 
tuate over short timescales (Lambert and Kussell, 2014). 

In an environment that changes in a predictive manner, gene 
expression programs were found to encode “anticipation” of 
the subsequent environmental changes so that genes are ex- 
pressed prior to the occurrence of the stimulus that normally 
activates them (Brunke and Hube, 2014; Mitchell et al., 2009; 
Tagkopoulos et al., 2008). But what if the challenges are unfamil- 
iar? Response to an unforeseen challenge may require a dedi- 
cated strategy. In one study in yeast, a gene that is essential 
under the applied conditions was placed under a promoter that 
precludes its expression (Stern et al., 2007). Cells were thus trap- 
ped in a situation in which they must express a gene, which they 
possess, though in an inaccessible regulatory form. After 



approximately ten generations, a solution appears to have 
been found, as the population restored the ability to grow. The 
nature of this solution remains largely unknown. A potentially 
useful hint appears to be that, when genome-wide transcription 
was monitored, it was found to be different in each repetition of 
the experiment, suggesting that the cell’s strategy might be to 
deliberately introduce noise to their expression program such 
that each cell will gamble on a potentially unique solution. In 
that respect, it could be appreciated that, like genetic mutations, 
which are predominantly neutral, gene expression changes 
might be neutral too (Koonin, 2007). Yet, whether noisy expres- 
sion is the solution to the unforeseen challenge in this case is still 
an open question. 

Rapid physiological reaction to a challenge is common among 
cells in the population and appears to be a first line of adaptation, 
which is mostly based on a hard-wired reaction to stimuli. This 
reaction is considered adaptive, as it not only improves the 
fitness under the current occurrence of the challenge, but as 
mentioned above, it might also improve the ability of the organ- 
ism to cope with immediately subsequent occurrences of this 
challenge. In our context, physiological adaptation is defined 
by a lack of ability to actively perpetuate a memory. Nonetheless, 
changes in gene expression may serve as a substrate for down- 
stream epigenetic modifications that can prolong their effect. 

Epigenetic Adaptations 

In this section, we distinguish physiological adaptations from the 
next modes of adaptation that feature active mechanisms for 
memory propagation across generations. Despite the contro- 
versy over the diverse definitions of epigenetics (Riddihough 
and Zahn, 2010), it is probably within the consensus that they 
involve some active mechanisms to perpetuate a memory 
across (cellular or even organismal) generations. As we will 
discuss, this epigenetic memory can be implemented at any 
level of the Central Dogma. 

Inheritance by DNA Methyiation and Chromatin 
Modifications 

Apart from the nucleotide sequence itself, information on the DNA 
can be dynamically modified at two prime levels that constitute a 
major form of “epigenetics.” One prime source of epigenetic in- 
formation is implemented by covalent modification of DNA bases; 
the other is implemented by histones. DNA methylation of CpG di- 
nucleotide in promoters with CpG islands is generally considered 
transcriptionally repressive (Cedar and Bergman, 2009). Due to 
the palindromic nature of CpG di-nucleotides, methylation of 
the C residue in the parental DNA strain can be easily restored 
in the new strand after cell division by recognition and methylation 
of hemi-methylated sites. Yet, the capacity to inherit epigenetic 
changes across organismal generations, e.g., in animals, is 
limited since it is mostly erased in the early embryo and then re- 
established in each individual (Smith et al., 2014). 

DNA methylation is not the only epigenetic change that occurs 
on chromatin. Histone modifications, which occur in an impres- 
sive diversity of chemical forms and types, are long known to be 
associated with different states of transcription (Jenuwein and 
Allis, 2001). The “histone code” hypothesis asserts that some 
of the many modifications that take place on histones affect, 
either positively or negatively, transcription levels. However, 
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the issue is highly controversial (Henikoff and Shilatifard, 2011), 
as the main evidence for association between a particular chro- 
matin modification and transcription activity level is often correl- 
ative rather than causal. Thus, the alternative to the histone code 
hypothesis is that certain marks on chromatin result from, rather 
than cause, a transcriptionally active or repressed state (Henikoff 
and Shilatifard, 2011). The accumulation of evidence in favor of 
each of the two directions may suggest a reconciled reality in 
which transcription state determines certain histone modifica- 
tions, of which some can, in turn, affect transcription. Assume 
conservatively that transcription activation of a gene was regu- 
lated by a conventional transcription factor and that this change 
has consequently affected some histone marks in the vicinity of 
the regulated gene. These and other histone changes may sus- 
tain and perpetuate further the initial transcriptional activation. 
In other words, the mutual effect of transcription activity and his- 
tone marks could serve as memory loop with improved self- 
perpetuation capacity that transmits a purely physiological tran- 
scription change into the longer enduring epigenetic level. One 
demonstration of the effect of chromatin regulation was 
observed in an experiment that confronted flies with an unfamil- 
iar challenge (a toxin), for which they were armed with a defense 
mechanism yet without a suitable regulatory program. Upon the 
first encounter with the toxin, flies had to suppress chromatin re- 
modelers, the Polycomb genes, in order to activate the defense 
gene. This change appears to have led to the de-repression of 
developmental regulators in the affected organ (Stern et al., 
2012), and some of the developmental alterations were epige- 
netically inherited by subsequent generations of unchallenged 
offspring. The possibility that histone marks are transferred 
across generations remains an open issue (Moazed, 2011). 
Nonetheless, recent indications from fission yeast show that 
chromatin marks can be inherited across many cell generations, 
independently of DNA sequence, DNA methylation, or RNA inter- 
ference. Thus, histone marks constitute epigenetic information 
that can be perpetuated long after the removal of the initiating 
trigger (Audergon et al., 2015; Ragunathan et al., 2014). 

RNA Inheritance 

RNA can also transmit epigenetic information between genera- 
tions. In C. elegans, dsRNA-mediated silencing has been shown 
to produce heritable responses (Fire et al., 1998). Recently, it 
has been further demonstrated that this nematode utilizes her- 
itable RNAi responses to cope with environmental stresses. In 
one case, RNAi response was shown to be adaptive by 
silencing an infectious viral genome (Rechavi et al., 2011). In 
addition to viruses, heritable small RNAs serve to ward off other 
genomic parasites such as transposons (Ashe et al., 2012; Shir- 
ayama et al., 2012). In another case, RNA inheritance enabled 
memory of an environmental challenge even when no foreign 
DNA is incorporated: it was shown that RNAi response can 
be inherited following a developmental arrest caused by starva- 
tion (Rechavi et al., 2014). In this context, the effect also 
increased the longevity of the progeny by targeting genes 
with a role in nutrition. Importantly, it was found that this 
induced gene silencing is transmitted in a non-Mendelian 
manner that is not dependent on a DNA template but, rather, 
on an RNA-dependent RNA polymerase, which replicates the 
RNA to a sufficiently high level that overcomes the dilution ef- 



fect across generations. In addition, small RNAs specifically 
induce the production of new small RNAs that spread also to 
nearby sequences (Sapetschnig et al., 2015). Notably, inheri- 
tance of small RNAs is dependent on specific factors, which 
are required for RNAi inheritance, but not for RNAi per se (Buck- 
ley et al., 2012). 

In summary, RNA has been shown to propagate memory via 
different types of self-reinforcing epigenetic loops. These diverse 
classes of non-coding RNAs emerge as key regulators of gene 
expression typically by modifying chromatin structure and 
silence transcription (Holoch and Moazed, 2015). 
Protein-Based Inheritance 

Prions constitute a unique mechanism to perpetuate protein- 
based phenotypic changes. Unlike most proteins, prions can as- 
sume more than one stable conformation, in which the phonic 
conformation can serve as an auto-catalyst that can convert 
other conformations to the prion conformation (DeArmond and 
Prusiner, 2003). Importantly, prions can be acquired from the 
environment, e.g., through the diet, such as in the case of the 
Prion Protein Mad Cow Disease, or in response to other environ- 
mental changes (Lindquist, 1996). Such is the case of the trans- 
lation terminator SUP35 in yeast. In its non-prionic form, this pro- 
tein serves as a release factor, needed for the proper translation 
termination of the ribosome at STOP codons. Yet, in its prion 
version, this protein aggregates and becomes less effective 
and accurate in terminating translation (Shorter and Lindquist, 
2005). The outcome is therefore an extension of the polypeptide 
beyond the canonical STOP codon in a mechanism that might 
allow proteins to be extended with some stochasticity and poten- 
tially result in a population with enhanced phenotypic diversity 
(Halfmann et al., 2012). Like the above-mentioned stochastic 
transcriptome response and DNA methylation, this mechanism 
too can be activated upon stress (Halfmann and Lindquist, 
2010), thus rapidly disseminating non-genetic diversity when di- 
versity might be most needed. Yet, the feature that makes prion- 
based response truly exciting is its self-perpetuating nature. The 
autocatalytic tendency of the aggregation form appears to act as 
an epigenetic memory that perpetuates through generation and 
generates non-genetic diversity upon which natural selection 
can act. More recent work on another yeast prion, [GAR+], 
demonstrated how a prion can become adaptive by allowing cells 
to utilize more diverse metabolic capacities. Induced by bacteria, 
the [GAR+] prion state allows yeast to switch from purely ferment- 
ing glucose into a more versatile state that allows the simulta- 
neous exploitation of diverse carbon sources (Jarosz et al., 
2014a). Importantly, fitness of [GAR+] cells was found to be 
higher in low-glucose environments compared to cells in which 
the protein is in its non-prion state (Jarosz et al., 2014b). 

In summary, epigenetic adaptation consists of a rich set of 
mechanisms that provide fascinating opportunities for organ- 
isms to rapidly disseminate variability in populations, long before 
genetic changes begin to fixate. But nonetheless, they are not 
heritable to the degree that genetic changes are. In the relay 
race context, there is a mutual effect between transcription ac- 
tivity and histone/DNA marks. As for the effect of epigenetics 
on later levels of the spectrum, it seems that the epigenetic archi- 
tecture of genes also affects their chances of acquiring duplica- 
tions or mutations (discussed below). 
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Adaptation by Changes in DNA Copy Number 

As mentioned above, physiological and epigenetic adaptations 
are often carried out by changes in gene expression. Changes 
in DNA copy number are, in fact, another way to alter gene 
expression, yet this mode of adaptation is fundamentally distinct 
from the mechanisms mentioned above like transcription-factor- 
mediated changes. Genomic copy-number changes scale from 
single genes to aneuploidy (defined here as copy number 
change of whole chromosomes or parts of them). For most 
genes, a change in copy number results in altered mRNA levels 
as well as altered protein levels. This correlation between copy 
number and expression has been demonstrated in various or- 
ganisms, including yeast (Dephoure et al., 2014; Pavelka et al., 
2010a; Springer et al., 2010; Torres et al., 2007), plants (Huettel 
et al., 2008), mice (Kahlem et al., 2004; Lyle et al., 2004), and hu- 
mans (Gao et al., 2007; Henrichsen et al., 2009; Stingele et al., 
2012; Tsafrir et al., 2006; Williams et al., 2008). Therefore, DNA 
copy-number changes can be adaptive under selective pres- 
sures: when elevated expression is beneficial, extra copies can 
be acquired; conversely, when lower expression is beneficial, 
genomic copies can be lost. For example, the copy number of 
the human salivary amylase gene (AMY1) is positively correlated 
with the production level of salivary amylase protein, and popu- 
lations with high-starch diets have moreAMYl copies than those 
with traditionally low-starch diets (Perry et al., 2007). 

When a higher expression of a specific gene is under selection, 
any genomic duplication that contains this gene has the potential 
to be adaptive. The most precise duplication would be of a small 
locus containing the gene in need. Yet, genomic adaptations are 
assumed to occur randomly, and thus the larger the duplicated 
region is, the higher the chances are for it to include the needed 
gene. For example, parallel E. coli populations evolved under 
limiting lactulose (a lactose isomer) showed duplication-based 
adaptations that varied in length. Although all duplication 
included the lactose permease (lacY), the shortest duplication 
covered 1 8 nearby genes and the largest consisted of up to 74 
genes (Zhong et al., 2004). Notably, larger duplications come 
with a cost, as they contain many irrelevant genes whose expres- 
sion is altered too. This altered expression of a large number of 
genes simultaneously imposes a significant burden on the cell 
(Bonney et al., 2015; Tang and Amon, 2013). Focusing on large 
copy-number variations like segmental aneuploidy or whole- 
chromosome aneuploidy (referred together as aneuploidy), it is 
important to note that despite their substantial cost, they have 
unique advantages and characteristics that distinguish them 
from the other forms of adaptation in the spectrum, as discussed 
below. 

Aneuploidy as a Highly Accessible Evolutionary Solution 

Aneuploidy is caused by mis-segregation of homologous chro- 
mosomes during cell division, and estimates indicate occur- 
rence of 1 :1 0,000 cell cycles in yeast and up to 1 % in mammalian 
tissues (Knouse et al., 201 4; Thompson and Compton, 2008; Zhu 
et al., 2014). Given these frequencies, populations of cells may 
constantly contain a variety of aneuploid cells that may be uti- 
lized as a resource for adaptation when facing a new challenge. 
Indeed, analysis of the yeast gene deletion library revealed that, 
in ~8% of the strains, deletion of a gene led to aneuploidy 
(Hughes et al., 2000). Interestingly, in some of the observed an- 



euploidies, the duplicated chromosome was found to harbor a 
close homolog of the deleted gene. In another study, a causal 
connection between aneuploidy and drug resistance was 
shown. The fungal pathogen C. albicans repeatedly acquired 
chromosome 5 aneuploidy in response to an antifungal drug 
exposure (Selmecki et al., 2008). The major mechanism by which 
duplication of chromosome 5 confers increased drug resistance 
is by amplifying two genes located in the duplicated chromo- 
some: ERG11 (encoding the drug target) and TAC1 (encoding 
a transcriptional regulator of drug efflux pumps). Another yeast 
study showed that S. cerevisiae that have been evolved for 
~200 generations under sulfate-limited conditions exhibited 
genomic duplications of regions that harbored the SUL1 gene, 
which encodes a high-affinity sulfate transporter (Gresham 
et al., 2008). The rapid fixation of duplication-based adaptations 
mentioned above can be mainly attributed to their high occur- 
rence in genomes and to the fact that duplications amplify 
many genes concurrently. This makes genomic duplications a 
highly accessible local maximum in the fitness landscape, 
whereas other adaptations are more complex and thus require 
longer evolutionary time to be acquired. An interesting hypothe- 
sis is that evolution acts to organize related genes on the same 
chromosome, perhaps even in proximity within the chromo- 
some, so that duplications would elevate these genes together, 
with relatively fewer unrelated “hitchhiker” genes (Janga et al., 
2008). 

The Reversible Nature of Copy-Number-Based 
Adaptations 

Aneuploidy-based adaptations are rapidly gained in evolution 
upon stress, but how reversible are such adaptations when the 
selection pressure is removed? The antifungal drug resistance 
that was facilitated by aneuploidy was shown to be reversible, 
as the extra chromosome was eliminated upon removal of the 
drug (Selmecki et al., 2006). In another study, yeast cells that 
were artificially selected for high expression of a single gene 
showed two types of distinct adaptations: duplication of large 
genomic regions (that contain the gene under selection) and 
trans-acWng mutations. When selection was removed, only pop- 
ulations that adapted by aneuploidy could rapidly revert to base 
level (Rosin et al., 2012). This illustrates that adaptations based 
on duplications can serve as an “easy come easy go” adapta- 
tion, as when the stress is relieved, the costly duplication is 
driven out of the population much faster compared to 
sequence-based adaptations. 

The Effectiveness of Aneuploidy for Acute and Abrupt 
Stresses 

Genomic duplications appear to provide a rescue when a selec- 
tive pressure is introduced in an abrupt manner, but would they 
appear also when stress is slowly aggravating? A recent lab evo- 
lution study (Yona et al., 2012) directly tested the effect of the 
stress regime, abrupt versus gradual, on the type of the selected 
evolutionary solution. This work demonstrated that, when yeast 
cells were abruptly shifted from 30°C to 39°C, where they 
evolved for some 500 generations, adaptation was repeatedly 
achieved by duplication of chromosome 3. Yet, populations 
that were evolved under a different heat regime in which temper- 
ature increased gradually (from 30°C to 39°C by +1°C incre- 
ments every 50 generations) did not adapt by genomic 
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duplications but, rather, by sequence mutations. This suggests 
that, due to the high cost of aneuploidy, it is not an efficient 
response unless the selective pressure is acute and abrupt. Curi- 
ously, genome sequencing of the evolved populations shows 
that populations evolved under the abrupt heat-shock regime 
duplicated chromosome 3 but did not fixate any point mutation, 
while the populations that evolved under the gradual heat regime 
fixated 8-12 point mutations. It is tempting to speculate that this 
result could prove to be more general— that is, other conditions 
that select for aneuploidy under abrupt stress would select for 
changes other than aneuploidy when the same stress is applied 
gradually. 

Genomic Duplications as a Transient Solution that Can 
Be Refined by Focal Adaptations 

When the chromosome 3 aneuploid yeast were further evolved 
for additional >1 ,000 generations under high temperature, the 
extra copy of chromosome 3 was lost and replaced by a series 
of point mutations (Yona et al., 2012). The state of chromo- 
some duplication thus appears as a transient step, an evolu- 
tionary “stepping stone.” A similar evolutionary dynamic was 
observed in an E. coli study that showed how cells with 
impaired lac operon adapted first by multiple duplications of 
the impaired genes, as means to increase expression (Hen- 
drickson et al., 2002). This amplification not only enabled 
lactose utilization, but also made the lac-operon hypermutable, 
as any additional copies increase the chances of finding bene- 
ficial mutations in this locus. Indeed, shortly after the gene 
amplification, one of the duplicated copies acquired mutations 
that restored a high functional level that led to the subsequent 
elimination of the other low-functional copies (Hendrickson 
et al., 2002). Interestingly, this dynamics may also be relevant 
to pathogens that adapt drug resistance in the clinics. A recent 
study on clinical isolates of C. albicans suggested that, in some 
isolates, aneuploidies may have an important role as an inter- 
mediate adaptation that subsequently gives rise to more stable 
adaptive genotypes that confer drug resistance (Ford et al., 
2015). To conclude, it seems that prolonged evolution can 
solve the paradox of aneuploidy (Pavelka et al., 2010b; Shelt- 
zer and Amon, 2011): Under normal conditions, selection 
purges fitness-lowering aneuploidy. Yet, under abrupt 
stresses, beneficial aneuploidy is selected because it confers 
higher survivability and proliferation to enable expansion of 
the effective population that can further search the fitness 
landscape for more optimal and slowly acquired solutions. 
Thus, aneuploidy appears as a transient step along the adap- 
tation spectrum that facilitates a path to the next stage: 
sequence-based mutations. 

Adaptation of the Genetic Sequence: Mutations on the 
Spectrum 

The adaptation that is most commonly identified with evolution is 
genetic mutations. Mutations make the long-lasting adaptations, 
and unlike the previous stages along the adaptation spectrum, 
they can change not only expression regulatory regimes, but 
also actual protein sequence and function. 

Adaptation through duplications or mutations could serve as 
alternative evolutionary strategies, but which genes go through 
which track? A recent study found that the answer could be in 



promoters’ architecture. A careful analysis of nucleosome 
arrangement within promoters revealed a surprising deviation 
in some genes with the classical nucleosome-free region (NFR) 
architecture. It was found that genes whose expression is typi- 
cally required at a certain level, like housekeeping genes, tend 
to have an NFR and low transcriptional plasticity i.e., low tran- 
scriptional variation across conditions (Tirosh and Barkai, 
2008). It was further shown experimentally that the expression 
level of these NFR genes features low evolvability, i.e., their 
expression level is relatively insensitive to promoter mutations 
(Hornung et al., 201 2). In contrast, genes that require a more dy- 
namic transcription, like stress genes, typically lack an NFR, and 
their transcription is highly plastic and can be effectively altered 
by mutations (Hornung et al., 201 2; Tirosh and Barkai, 2008). In a 
follow-up lab-evolution experiment (Rosin et al., 2012), yeast 
cells were put under short-term selection for higher expression 
of specific genes that were deliberately chosen to represent 
either cases of classical NFR or its absence. Interestingly, 
when the gene under selection had an NFR, selection toward 
higher expression was achieved by a large segmental or even 
whole-chromosomal duplications of regions that harbor the 
gene, presumably because of the low ability of these genes to 
elevate transcription by mutations. Conversely, for genes with 
no NFR, higher expression was achieved by mutations and 
with no duplications. This notion that nucleosome architecture 
of genes can create a bias that affects downstream adaptations 
on the spectrum (duplication versus mutations) highlights 
another relay race dynamic that connects between the epige- 
netic and genetic levels. 

Here, we focus on genetic mutations only in the context of the 
adaptation spectrum and discuss some of the unique features of 
this adaptation mode. According to the “modern synthesis” of 
genetics and evolution, mutations are seeded at random in ge- 
nomes, irrespective of environmental conditions or potential 
phenotypic effects. Nonetheless, there is some evidence sug- 
gesting that genetic mutations can be more accessible in chal- 
lenging conditions and perhaps also in specific genomic regions. 
Stress-Induced Mutagenesis and Transcription- 
Coupled Mutations 

Diverse environmental stresses induce a higher rate of mutations 
(or lower efficiency of repair) (Gentile et al., 201 1 ; Giraud et al., 
2001; Loh et al., 2010; Oliver et al., 2000). In that respect, just 
like noise in gene expression or epigenetic noise in DNA methyl- 
ation, mutations introduce diversity in populations especially un- 
der stress, when higher mutation rate may be beneficial. Subse- 
quently, cells carrying mutations that confer higher fitness will 
prevail and improve the population’s fitness. It is possible that 
stress-induced mutagenesis might not be adaptive, and it oc- 
curs simply due to the fact that, under stress, many processes 
in the cell are less accurate. Yet, indications on the precise con- 
trol of error-prone DNA synthesis and some theoretical consider- 
ations on mutagenesis point toward the adaptive nature of 
increased mutagenesis under stress (Ishii et al., 1989; Lynch, 
2010; Sniegowski et al., 2000). Mutagenesis, or DNA repair 
following mutagenesis, changes not only over time, but also 
spatially along the genome (Schuster-Bockler and Lehner, 
2012; Supek and Lehner, 2015). Non-uniform distribution of mu- 
tations in different genomic regions suggests another potential 
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feature that may improve the efficient utilization of genomic mu- 
tations as a means of adaptation. What is truly interesting in our 
context is the process known as “transcription-coupled muta- 
genesis,” by which the rate of mutation is elevated in proportion 
to transcription rate (Jinks-Robertson and Bhagwat, 201 4). Thus, 
if transcription regulation allows “reading” different parts of the 
genome in different environments, transcription-coupled muta- 
genesis and related processes (Howan et al., 2012) may allow 
“(re)-writing,” i.e., mutagenizing different parts of the genome 
at different rates under specific conditions. While previous chap- 
ters of this Perspective described protein and RNA inheritance of 
a Lamarckian nature, stress-induced mutagenesis and tran- 
scription-coupled mutagenesis both introduce some Lamarck- 
ian changes to the DNA level as well. Indeed, natural evolution 
seems now more Lamarckian than we thought until recently 
(Koonin and Wolf, 2009). 

Phenotypic Mutations and Their interaction with 
Genetic Mutations 

Another means to diversify the proteome and the transcriptome 
before DNA mutations start to appear is known by the collective 
term “phenotypic mutations,” representing errors in transcrip- 
tion and translation (Burger et al., 2006). Of all processes in the 
Central Dogma, DNA replication often occurs with the highest fi- 
delity. While error rate of DNA replication is typically between 
1 0“® and 10“^° per base per cell cycle, the error rate of transcrip- 
tion and translation, per nucleotide or per amino acid, could be 
up to a million times higher (Gordon et al., 2009; Meyerovich 
et al., 2010; Pan, 2013). Conceivably, phenotypic mutations 
should not propagate between generations, as the lifetimes of 
RNAs and proteins are typically short, even compared to the 
generation time of unicellular species. However, phenotypic mu- 
tations can propagate by triggering transcription network loop, 
by interaction with genetic mutations, or even by assimilation 
into the genome. A study on the lac operon of E. coli, which com- 
prises an autocatalytic positive-feedback loop, has demon- 
strated a heritable epigenetic switch (Gordon et al., 2009, 
2013). In this study, transcription infidelity generated a mutated 
lac repressor with reduced ability to repress the lac operon. 
This led the lac operon to be more sensitive to lactose, i.e., the 
operon could be induced by lower concentration of lactose. In 
such a case, a positive-feedback loop of the induced operon 
state of transcription is propagated across generations. Another 
interesting dynamic that involves phenotypic mutations lies 
within the interaction between phenotypic and genetic muta- 
tions. The “look-ahead mutations” concept (Whitehead et al., 
2008) is a putative form of adaptation in which phenotypic muta- 
tions facilitate the fixation of high-complexity genetic mutations. 
Imagine, for example, that a new disulfide bridge between two 
cysteine residues is beneficial in a specific protein. Creating a 
new disulfide bond requires two genetic mutation events, in 
each of which a non-cysteine is converted into a cysteine. The 
evolutionary catch is that the organism’s fitness does not in- 
crease before the later of the two mutation events occurs, i.e., 
since the first mutation alone would confer no (or even negative) 
fitness gain, it has lower chances of existing in the population. 
The “look-ahead mutation” is a theoretical scenario in which 
one of the two mutations is a phenotypic mutation, while the 
other is a genetic one. Under certain realistic quantitative as- 



sumptions (regarding error rates, etc.), it was shown that, 
indeed, a hybrid of phenotypic and genetic double mutant could 
emerge and be sustained. For example, a cell in the population 
that carries the first genetic mutation can obtain a second 
phenotypic mutation with partial functionality that will increase 
its fitness and, thus, its fraction in the population. Following 
that, the phenotypic mutation can be replaced with a fully func- 
tional genetic counterpart that will further increase the fitness 
(see Figure 3 in review by Koonin, 2012). In that respect, the 
phenotypic mutation may serve as an intermediate evolutionary 
“stepping stone” that can be rapidly attained and later replaced. 

Phenotypic mutations might also be assimilated directly into 
the genome— for example, via the process of reverse transcrip- 
tion (RT). RT is not only used by retro viruses; it also occurs in 
cellular life, and it might act on genes in addition to retrotranspo- 
sons (Cordaux and Batzer, 2009). This mechanism appears to be 
relevant in cancer, as a recent study found intron-less versions of 
human genes in cancerous genomes. These newly formed retro 
genes most likely result from reverse transcription of certain tran- 
scripts that are acquired somatically during cancer development 
(Cooke et al., 2014). Given the high rate of transcription errors, 
RT could serve as a potential evolvability mechanism by which 
phenotypic mutations become genetic. 

Genetic Redundancy: The Longest-Lasting Adaptation? 
We have proposed an adaptation spectrum that culminates in 
hard-wired genetic changes. Such changes are indeed stably 
“memorized” by genomes. However, even for a genetic adapta- 
tion, memory is not guaranteed to be indefinite. Mutational drift is 
certainly possible especially in periods when environmental con- 
ditions no longer necessitate the previously adaptive change. In 
this respect, what is more stable than a stable genetic change? 
Perhaps two stable genetic changes. Indeed, biological redun- 
dancy is prevalent in many genomes, and it is often suggested 
to provide a “fail-safe” mechanism, or backup: if one of two 
redundant genes is mutated, the other can still perform the lost 
function, albeit often upon a change in expression program (De- 
Luna et al., 2010; Kafri et al., 2009). The evolutionary stability of 
redundant genetic states is not trivial and can be sustained only 
under certain conditions (Nowak et al., 1997). Nonetheless, it is 
expected that, if the selective pressure that necessitated the 
adaptation is removed, the genetic adaptation will still be sus- 
tained, provided that the process of “neofunctionalization” (He 
and Zhang, 2005) has not yet taken place. 

The processes that can occur along the adaptation spectrum 
have largely been described within the conceptual framework 
of one cell’s lineage, and indeed we have been deploying the 
word adaptation in the context of an individual cell’s (or organ- 
ism’s) improved fitness. However, to translate to evolutionarily 
meaningful changes and, indeed, to meet the more commonly 
recognized meaning of the term “adaptation,” the described 
beneficial changes within a cell/organism need to ultimately 
result in changes at the population level. Our thinking on this is 
as follows. Consider an environmental stress that necessitates 
high expression of a particular gene. Cells in the population that 
highly express this gene, either in response to the stress or 
even prior to its occurrence, will have a temporary advantage 
over other cells. This higher expression may be achieved by sto- 
chastic differences at the physiological or epigenetic level. These 
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Figure 2. The Interplay between Different 
Modes of Adaptation on the Spectrum 

Illustration of directional interactions between 
adaptation modes, where early adaptations with 
short persistence affect later, more durable, ad- 
aptations. Short descriptions of such interactions 
(arrows) demonstrate how later adaptations are 
more focused according to the trajectory set by 
earlier adaptations. We suggest that all modes of 
adaptation progress as in a relay race to optimize 
the whole process of organismal adaptation. 




cells that exhibit higher expression have a dual advantage in the 
population: first, they benefit from a higher fitness (as long as the 
stress persists); second, due to relay-race dynamics, their line- 
ages may have higher chances of propagating the original 
short-term acclimation into a more sustainable genetic adapta- 
tion. Such cell lineages will thus have an advantage both in 
terms of cell number and in terms of an increased per-cell prob- 
ability of further adaptations. The outcome could be that, by the 
time the stress is relieved and the population returns back to its 
“ground state,” the physiological acclimation has already been 
propagated to the genetic level and it now prevails in the popu- 
lation. 

A Relay-Race Cascade within the Adaptation Spectrum 

We have delineated an evolutionary adaptation spectrum along 
which organisms may progress as they adapt to a new chal- 
lenge. We have taken the risk of generalization in suggesting a 
stereotypical order of exploration along the spectrum, starting 
from the physiological adaptation and gradually moving toward 
the genetic, though deviations from this simple-minded search 
strategy could certainly be envisaged. In this last section, we 
discuss the possibility that, in some cases, realization of a given 
stage along the spectrum could facilitate the progression into a 
next stage, as in a relay race (Figure 2). 

Starting with the physiological level, changes in gene expres- 
sion contribute to the first line of adaptation; nonetheless, they 
may also actively set in motion later modes of adaptation. It 
has been widely suggested that gene expression affects the 
epigenetic “chromatin landscape” (Henikoff and Shilatifard, 
2011). For example, expression of a non-coding RNA induces 
epigenetic silencing of ribosomal genes by interaction with their 
promoter (Schmitz et al., 2010). Physiological changes in gene 
expression may also induce genomic duplications like aneu- 
ploidy. Interestingly, the higher rates of aneuploidy observed af- 
ter stress are connected to the activity of the chaperone Hsp90 
(Chen et al., 2012). In our context, Flsp90 may represent a relay- 
race baton that mediates between these two modes of adapta- 
tion. On top of being a stress-response gene, Hsp90 has an 
evolutionarily conserved role in the kinetochore assembly (Nii- 
kura et al., 2006), and therefore it also has a role in aneuploidy 
formation. Therefore, physiological modulation of Hsp90 under 



stress is suggested to facilitate a search 
for adaptations by diversifying the karyo- 
type. Further down the spectrum, the 
classical role of Hsp90 can also be dis- 
cussed in the context of the relay race. 
As a chaperone, Flsp90 was shown to act as an “evolutionary 
capacitor”— when active, it appears to mask the effect of muta- 
tions on the phenotype, and when repressed, those cryptic vari- 
ations can be exposed (Rohner et al., 201 3; Rutherford and Lind- 
quist, 1998). In that respect, this protein serves as a baton in the 
relay race, mapping its own expression onto the effect of 
sequence mutation on the phenotype. Finally, physiological ad- 
aptations might also have an effect on genetic mutations. RNA 
transcripts (which typically carry more mutations) can serve as 
a template for DNA repair by homologous recombination of the 
original genomic sequence from which they were transcribed 
(Keskin et al., 201 4) or, as mentioned above, by actual integration 
of a cDNA reverse-transcription product into the genome, as 
shown in cancer (Cooke et al., 201 4). Such processes may facil- 
itate rapid evolution of currently expressed genes, with a useful 
bias in favor of highly expressed genes, as they produce more 
RNA copies that can be reinserted into the genome. 

Further down the spectrum, another relay-race dynamic oc- 
curs when genomic duplications promote subsequent genetic 
mutations. Large duplications (like aneuploidy) not only increase 
the mutation rate (Sheltzer et al., 2011), but also favor mutations 
that are related to the duplicated region. First, increased copy 
number increases the probability of a mutation in one of the 
duplicated copies. Second, the excessive cost of large duplica- 
tions may affect mutations by favoring those who can replace the 
duplication. Any emerging mutation(s) that can replace the dupli- 
cation-based adaptation is reinforced by an additional fitness in- 
crease at the magnitude of the cost that was saved. This added 
advantage increases the likelihood of such mutations to fixate 
faster than other mutations that are not related to the initial dupli- 
cation. In this way, there is a bias on subsequent mutations to 
cope with the selective pressure that led to the initial duplica- 
tion-based adaptation. 

We present the example of translational optimization with 
which progression along the adaptation spectrum can be 
conceptualized with a specific set of cellular processes in Box 1 . 

A major topic that has not been discussed here in great length 
is that of cancer. Cancer is an evolutionary process, and as such, 
it may exploit different adaptions along the spectrum, as well as 
the relay race between them. To begin with, the cancerous tran- 
scriptome is known to be radically reprogrammed (c.f., Segal 
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Box 1. The Adaptation Spectrum of the Translation Process: A Test Case 



Many of the cellular resources, i.e., energy and raw materials, are devoted to ensure adequate translation of proteins. Consequently, translation 
optimization is a major driving force in evolution. One of the factors that governs translation optimization is the balance between supply and demand 
i.e., between the tRNA pool to the codons used by currently expressed mRNAs. Perturbations in the supply-to-demand balance emerge when a new 
environment requires the expression of different proteins with a different codon usage (Gingold et al., 2012) or because the availability of some 
tRNAs is altered (Dittmar et al., 2005; Pavon-Eternod et al., 2013; Wiltrout et al., 2012). Examining how cells restore translational balance reveals 
many of the stages along the adaptation spectrum. 

Measuring expression of the tRNA pool under diverse conditions shows physiological response in which distinct tRNA types are up/downregulated 
(Gingold et al., 2014). On the next level, epigenetics is becoming increasingly appreciated in this context, as histone modifications around tRNA 
genes change dynamically in response to cells’ conditions (Barski et al., 2010; Gingold et al., 2014; Oler et al., 2010), e.g., in cancer. Further on 
the spectrum, duplication of tRNA genes appears to provide elevated expression from certain tRNAs that are in high demand, and indeed, many 
of the tRNA genes occur in multiple-copy families that change their relative sizes in evolution (Man and Pilpel, 2007). Therefore, it will be interesting 
to ask whether some of the chromosome gains and losses observed in cancer correspondingly increase or decrease tRNAs’ availability to support 
the progression on cancer. Next, mutations within tRNAs were also found to be adaptive, presumably in response to change in the demand-to-sup- 
ply ratio. When a tRNA gene is artificially deleted from the yeast genome, another tRNA gene with a different anticodon but of the same amino acid 
evolutionarily “responds” with a mutation that converts its anticodon tothat of the deleted one (Yona et al., 2013). Such “anticodon switching” was 
subsequently found to be very prevalent in the natural evolution of species (Rogers and Griffiths-Jones, 2014; Yona et al., 2013). In each of these 
cases, we do not know whether the anticodon-switching mutation was preceded by earlier transcriptional/epigenetic changes, yet it is tempting to 
speculate that such physiological changes may have constituted an intermediate solution to the challenge before it was solved genetically. Further, 
the tRNA pool probably also realizes the last stage of the adaptation spectrum, i.e., that of genetic redundancy by compensation over mutated 
tRNAs (Bloom-Ackermann et al., 2014). Thus, partial redundancy among tRNAs may act to increase evolutionary stability on one hand and to facil- 
itate evolutionary plasticity of the tRNA pool on the other hand. 

The green and blue ovals represent codons that 
correspond to the anticodons of the green and 
blue tRNAs. Prior to an environmental change 
(upper-left), there is a high usage (translation 
demand) of a certain codon (green oval) 
compared to another codon (blue oval), and 
the tRNA levels (translational supply) match 
accordingly. An environmental shift (upper- 
right) may result in a physiological change 
both at the codon usage (now, the blue codon 
is in higher demand, because mRNAs that are 
enriched in the codon are induced), and tRNA 
levels adjust correspondingly. The higher 
expression of the blue tRNA could then be prop- 
agated into the epigenetic level, e.g., through 
changes in activation or repression-associated 
histone mark in the tRNA genes’ vicinity. Such 
changes in the tRNA pool may be further imple- 
mented by changes to tRNA gene copy number. 
Finally, more copies of the same tRNA gene 
may increase its probability of acquiring both 
functional and regulatory mutations, like anti- 
codon switching. Dashed arrows (gray) con- 
necting the different levels represent hypothe- 
sized relay race between the levels. 




et al., 2004), and recent analyses of cancer epigenomes showed 
that DNA methylation is stochastic rather than precise (Landan 
et al., 2012; Landau et al., 2014). The question of whether these 
changes at the physiological and epigenetic levels are con- 
nected remains unknown. Further along the spectrum, aneu- 
ploidy is a hallmark of cancer, and despite the debate of whether 



it is a cause or a consequence of cancer (Sheltzer and Amon, 
2011), aneuploidy is suggested to provide cancer with both 
higher mutation rate and with a faster means to change dosages 
of cancer-driving genes, i.e., upregulating expression of onco- 
genes or downregulating tumor-suppressor genes (reviewed 
by Gordon et al., 2012). Furthermore, since aneuploidy can 
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increase copy number of cancer-driving genes and mutation 
rates simultaneously, it may lead to a hypermutability effect of 
oncogenes that were duplicated. Taken together, it is intriguing 
to speculate that cancer cells might exploit the relay race notion 
proposed here in gaining more aggressive traits much faster. 

In conclusion, we suggest that the distinct modes of adapta- 
tion have been optimized by evolution not only to perform their 
adaptive function, but also to interact with later modes of adap- 
tation. In this way, the whole process of adaption can yield better 
results as the fitness landscape is being explored more effi- 
ciently according to the trajectory set by earlier adaptations. 
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Generation, transformation, and utilization of organic molecules in support of cellular differentia- 
tion, growth, and maintenance are basic tenets that define life. In eukaryotes, mitochondrial oxygen 
consumption plays a central role in these processes. During the process of oxidative phosphoryla- 
tion, mitochondria utilize oxygen to generate ATP from organic fuel molecules but in the process 
also produce reactive oxygen species (ROS). While ROS have long been appreciated for their dam- 
age-promoting, detrimental effects, there is now a greater understanding of their roles as signaling 
molecules. Here, we review mitochondrial ROS-mediated signaling pathways with an emphasis on 
how they are involved in various basal and adaptive physiological responses that control organ- 
ismal homeostasis. 



Mitochondria and Associated Homeostatic and Stress 
Signaling Pathways 

Mitochondria are essential organelles present in all but a few 
mammalian cell types, where they perform multiple functions. 
They are the sites of the tricarboxylic acid (TCA) cycle and 
oxidative phosphorylation (OXPHOS), through which large 
amounts of ATP are generated using the electrochemical 
gradient generated across the inner of two membranes by the 
electron transport chain (ETC). However, their critical roles in 
metabolism go far beyond glucose oxidation via OXPHOS 
and include fatty acid and amino acid metabolism and biosyn- 
thesis of hormones, heme, and iron sulfur clusters. Further- 
more, in addition to metabolism, mitochondria are involved in 
apoptosis, ion homeostasis, and innate immunity, with new 
roles in cell and organismal biology being discovered at an un- 
precedented rate. 

Mitochondria are complex in composition, form, and function. 
Though often depicted as small round or oval structures, they are 
instead usually dynamic, branched networks that constantly 
fuse and divide under control of specific fission and fusion ma- 
chineries (Mishra and Chan, 2014). Proteomic analyses indicate 
that mammalian mitochondria contain ~1 ,200 proteins, with the 
precise composition varying significantly between cell and tissue 
types (Calvo and Mootha, 2010). Thirteen of these proteins are 
encoded by the maternally inherited mitochondrial DNA (mtDNA) 
located in the matrix, while the rest are encoded by nuclear 
genes and targeted to the organelle by specific protein import 
pathways (Shadel and Clayton, 1997). Thus, mitochondrial 
biogenesis and homeostasis, including mtDNA expression and 
maintenance, are under strict control of nuclear gene expression 
programs (Scarpulla, 2008). 



The overall status of mitochondria is constantly monitored, al- 
lowing their number, morphology, distribution, and activity to be 
modulated by developmental, physiological, and environmental 
cues. This requires bi-directional signaling pathways that medi- 
tate crosstalk between mitochondria and the nucleus. Pion- 
eering studies in budding yeast revealed that mitochondrial 
dysfunction leads to so-called “retrograde signaling” events 
that result in adaptive changes in nuclear gene expression and 
metabolism mediated by specific transcription factors (Butow 
and Avadhani, 2004). Mitochondrial retrograde signaling path- 
ways also exist in mammals and are now receiving considerable 
attention because they drive both beneficial and pathogenic 
adaptive responses. 

Given their complicated nature, mitochondrial stress can man- 
ifest in many forms that elicit different stress signals. Reduced 
ETC/OXPHOS capacity can result in cellular energy deprivation 
(e.g., reduced ATP/energy charge), altered mitochondrial ROS 
(mtROS) production, or loss of mitochondrial membrane poten- 
tial, with the precise outcome dictating the specific mitochon- 
drial stress-signaling response (Butow and Avadhani, 2004; 
Sena and Chandel, 2012). Reduced mitochondrial protein 
import, improper assembly of large enzymatic complexes (e.g., 
OXPHOS and ribosomes), and altered chaperone activity can 
cause proteotoxic stress and mitochondrial unfolded protein re- 
sponses (Haynes et al., 2013; Rugarli and Danger, 2012). As ma- 
jor sites of ROS production, mitochondria are also prone to 
oxidative damage and stress. Damage, mutation, or depletion 
of mtDNA causes distinct forms of mitochondrial stress and 
downstream signaling (Scheibye-Knudsen et al., 2015; West 
et al., 201 5). Finally, altered morphology, dynamics, and distribu- 
tion can lead to distinct forms of stress and are linked to 
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Figure 1. Mitochondrial ROS Signaling 
Basics 

Superoxide ('O2-) is generated on both sides of 
the inner mitochondriai membrane and hence 
arises in the matrix or the intermembrane space 
(iMS). Superoxide can be converted to hydrogen 
peroxide (H2O2) by superoxide dismutase en- 
zymes (SOD 1 in the IMS or SOD 2 in the matrix). 
The resulting hydrogen peroxide can cross mem- 
branes and enter the cytoplasm to promote redox 
signaling. Superoxide is not readily membrane 
permeable but may be released into the cytoplasm 
through specific outer membrane channels, as 
shown (see main text). In addition to signaling in 
the cytoplasm directly, both superoxide and 
hydrogen peroxide could, in principle, oxidize or 
modify other molecules in mitochondria that 
can be released/exposed to the cytoplasm to 
signal (redox-sensitive second messenger; X). 
These mitochondrial ROS (mtROS) can generate 
signaling responses and changes in nuclear gene 
expression in multiple ways (shown to the right). 
There are other fates of mtROS that would prevent 
signaling (or potentially enact other signaling and 
damage responses). For example, superoxide can react with nitric oxide (NO) to form peroxinitrite (ONOO-). This would prevent its conversion to hydrogen 
peroxide, could cause damage by the highly reactive peroxynitrite, and could potentially limit NO availability for its own type of signaling. Hydrogen peroxide can 
be eliminated enzymatically by glutathione peroxidase (Gpx) in the matrix or peroxiredoxins (Prdx) in the matrix and elsewhere in the cell. Peroxyredoxins can also 
promote redox signaling by promoting disulfide bond formation in target proteins. Finally, in the presence of transition metals, hydrogen peroxide can generate 
damaging hydroxyl radicals (*OH). 




mitochondrial turnover by autophagy or mitophagy (Labbe et al., 
2014). Changes in these parameters and associated stress- 
signaling responses can occur downstream of physiological 
(e.g., nutrient limitations, substrate availability, exercise), envi- 
ronmental (exposure to drugs or toxins), and genetic cues. 
With regard to genetics, the importance of these pathways is un- 
derscored by the fact that inherited mitochondrial diseases are 
caused by mutations in genes encoding proteins involved in 
each of these processes that can be inherited maternally (mtDNA 
mutations) or in a Mendelian fashion (nuclear gene mutations) 
(Nunnari and Suomalainen, 2012). However, much remains 
to be learned about these stress pathways. For example, the 
specific sensors of different forms of mitochondrial stress and 
the cell and tissue specificity of the signaling responses remain 
largely unknown. Furthermore, the degree of crosstalk between 
different mitochondrial stress pathways and what determines 
whether they elicit beneficial or maladaptive responses are not 
clear. 

Mitochondrial ROS Signaling 

ROS are formed by one-electron transfers from a redox donor to 
molecular oxygen (O 2 ). This initially generates the anionic free- 
radical superoxide that can be converted to hydrogen peroxide 
by superoxide dismutase enzymes (Figure 1). Hydroxyl radical 
is another ROS that can be formed (e.g., by metal-catalyzed 
oxidation of hydrogen peroxide), but in this Review, “ROS” re- 
fers to superoxide and hydrogen peroxide unless otherwise 
noted. In mitochondria, the orderly flow of electrons down the 
mitochondrial ETC to complex IV results in their final deposition 
into molecular oxygen to form water. However, electrons can 
also react prematurely with oxygen at sites in the ETC to form su- 
peroxide/hydrogen peroxide (Murphy, 2009). Complexes I and III 
are often regarded as the major sites of mtROS production, but 
more recent studies indicate that at least ten other mitochondrial 



enzymes also contribute, including complex II (Quinlan et al., 
2013). That different sites of mtROS production have distinct 
signaling roles and the primary production sites likely change un- 
der different physiological conditions is likely (Quinlan et al., 
2013; Sena and Chandel, 2012). 

That hydrogen peroxide has robust signaling roles in cells was 
elucidated through studies of receptor tyrosine kinase, growth- 
factor signaling that showed bursts of ROS production by 
NADPH oxidase (NOX) enzymes. A major mechanism at play in 
this scenario is the inactivation of redox-sensitive protein tyro- 
sine phosphatases that normally downregulate these receptors 
(via dephosphorylation) by localized NOX-dependent production 
of hydrogen peroxide. Several paradigms emerge from these 
studies that are relevant to mitochondrial hydrogen peroxide 
acting as a signal (Finkel, 2012). First, many NOX enzymes pro- 
duce extracellular superoxide that dismutates to hydrogen 
peroxide that is then is transported across the plasma mem- 
brane, perhaps in a regulated manner through aquaporin chan- 
nels, to effect localized redox signaling. In a similar manner, su- 
peroxide produced in the mitochondrial inner-membrane space 
or matrix can be converted to hydrogen peroxide by SOD1 or 
SOD2, respectively, allowing it to diffuse into the cytoplasm to 
signal (Figure 1). Whether this involves free diffusion or facilitated 
diffusion through specific channels in mitochondrial membranes 
remains unclear. Second, the inactivation of phosphatases by 
hydrogen peroxide occurs through the modification of specific 
reactive thiol side chains (e.g., cysteine). It is now recognized 
that cysteine residues on many proteins can undergo a variety 
of redox-dependent modifications, including sequential oxida- 
tion (to sulfenic, sulfinic, and sulfonic acid), glutathiolation, and 
S-nitrosation (Go et al., 2015). Like phosphorylation, ubiquitina- 
tion, and other post-translational modifications, these redox 
modifications can alter protein structure and function and be 
regulatory. Therefore, selective oxidation or modification of 
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redox-dependent thiols in regulatory proteins allows for intri- 
cate cellular redox-switch control mechanisms (Finkel, 2012; 
Go et al., 2015). Redox regulatory proteins that associate with 
or are otherwise selectively tuned to readout mitochondrial 
hydrogen peroxide production would provide a mechanism 
for mtROS signaling (Figure 1). Perinuclear clustering of mito- 
chondria has also been postulated to be a mechanism for direct 
mitochondrial-nuclear signaling via mtROS (Al-Mehdi et al., 
2012 ). 

Superoxide is often summarily dismissed as a relevant 
signaling molecule because of its chemical properties. For 
example, unlike hydrogen peroxide, it is a negatively charged 
molecule and hence not able to easily diffuse across cell mem- 
branes, and it does not engage in protein cysteine oxidation 
reactions conducive to known redox-switch mechanisms of 
signaling (Winterbourn, 2008). However, as we will discuss, 
physiologically relevant, superoxide-mediated signaling does 
appear to exist that is distinct from hydrogen-peroxide-mediated 
signaling pathways. Although not “freely” diffusible, mitochon- 
drial superoxide can be released from the intermembrane space 
into the cytoplasm through the voltage-dependent anion channel 
that spans the outer mitochondrial membrane (Han et al., 2003) 
(Figure 1). This includes superoxide that is generated in the inter- 
membrane space by complex III, as well as that generated in the 
matrix by complex I (and potentially by other enzymes) (Lust- 
garten et al., 2012). However, the latter may require superoxide 
levels to cross a critical threshold (e.g., when antioxidant de- 
fenses are limiting). In the budding yeast, S. cerevisiae, mito- 
chondrial superoxide is released into the cytoplasm by a specific 
isoform of the voltage-dependent anion channel (Pori p) or, in the 
absence of Porip, through the TOM protein import complex 
(Budzihska et al., 2009). Thus, superoxide released from mito- 
chondria can, in principle, participate directly in cytoplasmic 
signaling processes (Figure 1). Finally, it is possible that 
mitochondrial matrix superoxide signals to the cytoplasm via 
redox-sensitive second-messenger systems that have yet to 
be defined (Figure 1). 

Mitochondrial ROS Signaling in Organismal Physiology: 
Lessons from Model Systems 

There is now extensive evidence from the study of model organ- 
isms supporting an active role for mtROS signaling in organismal 
physiology and adaptive responses (Hamanaka and Chandel, 
2010; Ristow and Zarse, 2010; Yun and Finkel, 2014). The trac- 
tability of these genetic model systems has led to elucidation 
of new molecular details and signaling pathways underlying 
these responses. In this regard, much has been learned in the 
context of aging and longevity studies, which will be highlighted 
here. 

In the nematode worm, C. elegans, several pathways that 
extend lifespan involve increased mtROS production and 
signaling. These studies have called into question the “mito- 
chondrial” and “free radical” theories of aging (at least as origi- 
nally formulated) by implicating ROS as pro-longevity signals as 
opposed to damaging, pro-aging agents, as they’ve long been 
viewed. Ristow and colleagues broke new ground in this area 
by showing that reduced glucose availability leads to increased 
mitochondrial respiration and mtROS production that delays 



worm aging (Schulz et al., 2007). They and others have subse- 
quently found that increased mtROS is a common downstream 
event in many conserved longevity-promoting interventions, 
which has led to the concept of “mitohormesis” (Ristow and 
Zarse, 2010; Yun and Finkel, 2014). Recent work in this area in- 
cludes mtROS signaling in the anti-aging effects of reduced insu- 
lin/IGF signaling and D-glucosamine supplementation (Weimer 
et al., 2014; Zarse et al., 2012). Inhibition of the mitochondrial 
ETC by certain mutations or inactivation of mitochondrial 
SOD2 increases worm lifespan and has been causally linked to 
increased mtROS production (Dancy et al., 2014). Hekimi and 
colleagues have recently shown that this involves a unique 
form of activation of apoptotic signaling cascades to promote 
protective stress responses rather than apoptosis (Yee et al., 
2014). Longevity-extending effects of mtROS in worms are 
also mediated by HIF1 and AMP kinase signaling and are linked 
to some degree to enhanced immunity (Hwang et al., 2014; Lee 
et al., 201 0). Like in worms, mtROS signaling extends chronolog- 
ical lifespan in S. cerevisiae, which, in part, is how reduced 
TORC1 signaling mediates longevity in this organism (Bonawitz 
et al., 2007; Pan et al., 2011; Schroeder et al., 2013). Here, the 
mtROS signal activates the DNA-damage-sensing kinases, 
Tellp and Rad35p (yeast orthologs of ATM and Chk2), leading 
to enhanced subtelomeric silencing via inactivation of Rphip, 
a histone H3K36 demethylase of the jumonji family of enzymes 
(Schroeder et al., 2013). This response, vis-a-vis the mtROS- 
mediated, apoptotic-signaling response in worms discussed 
above (Yee et al., 2014), suggests that a new paradigm is 
emerging whereby canonical stress-response pathways (e.g., 
DMA repair and apoptosis) are utilized differentially to sense 
mtROS to elicit adaptive, homeostatic responses in addition to 
the emergency and cell death responses for which they were 
defined originally. 

While the above discussion was limited largely to examples 
from worms and yeast, it is important to note that similar mtROS 
longevity pathways have been shown to operate in other inverte- 
brates and in mice (Hekimi et al., 201 1 ; Ristow and Schmeisser, 
2011). Furthermore, these pathways are not limited to anti-aging 
responses. For example, mtROS signaling has also been impli- 
cated in other homeostatic pathways and processes, including 
wound healing (Xu and Chisholm, 2014), survival under hypoxia 
(Schieber and Chandel, 2014), intracellular pH homeostasis 
(Johnson et al., 2012), cell differentiation (Hamanaka and Chan- 
del, 2010; Hamanaka et al., 2013; Tormos et al., 2011), and 
innate immunity (West et al., 2011). Accordingly, the remainder 
of this Review will be devoted to the role of mtROS in whole- 
body physiology, with the main focus on neuroendocrine control 
of systemic metabolism in mammals. 

ROS Generation and Central Control of Whole-Body 
Metabolism 

The amount of mtROS generated in metabolic processes de- 
pends on the fuel load and type (lipid, carbohydrate, protein), 
as well as the amount, composition, activity, and dynamics of 
mitochondria in the cell or tissue involved. Because ROS are 
de facto by-products of mitochondrial oxidative metabolism, it 
may not be surprising that studies have connected mtROS 
to neuroendocrine control of metabolism, including feeding 
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Figure 2. Schematic Illustration of Hypothalamic Control of Negative Energy Metabolism with Low ROS 

(A) In the brain, the hypothalamus contains neuronal populations that control hunger (negative energy balance) and satiety (positive energy balance). Hunger state 
is promoted by neurons (purple) that produce Agouti-related peptide (AgRP) and neuropeptide Y (NPY), as well as GABA. When these neurons are active (hunger, 
calorie restriction, starvation), systemic metabolism is shifting to lipid metabolism with an overall lower level of mtROS production in all tissues. 

(B) The activation of AgRP neurons during negative energy balance is promoted by pathways enabling long-chain fatty acid oxidation in the mitochondria, which is 
enabled by maintenance of low mtROS generation by engagement of UCP2 and mechanisms that propagate fission and/or proliferation of mitochondria (NRF1 , 
Sirt1, and PGC1a). 



behavior, energy expenditure, and glucose homeostasis (An- 
drews et al., 2008; Benani et al., 2007; Diano et al., 201 1 ; Horvath 
et al., 2009; Leloup et al., 2006; Long et al., 2014). 

Modulation of mtROS Production by Uncoupling Protein 
2 in the Brain 

The discovery of new members of the uncoupling protein (UCP) 
family in 1 997 (Fleury et al., 1 997) and the localization of UCP2 to 
specific brain areas (Horvath et al., 1999; Richard et al., 1998) 
initiated studies by several groups to unmask what these pro- 
teins might do in neurons and in other brain cells. Regardless 
of the wealth of information gained in these studies, there re- 
mains a great ambiguity about the precise role of these UCPs 
in cellular functions (Brand and Esteves, 2005). 

There is little debate regarding the functional relevance of 
UCP1 in brown adipose tissue, where it promotes mitochondrial 
fatty acid oxidation by uncoupling mitochondrial electron trans- 
port from ATP generation under adrenergic and thyroid control 
(Ricquier, 1998). Under this scenario, the gained energy is dissi- 
pated in the form of heat, which is the hallmark of non-shivering 
thermogenesis (Ricquier, 1998). UCP2 is clearly not a classical 
uncoupler like UCP1 , and its precise mode of action remains un- 
clear. Furthermore, none of the cells in the brain that express 
UCP2 displays the aforementioned features of brown adipo- 
cytes. At baseline, UCP2 in rodents and primates is expressed 
predominantly in neurons of basal structures of the brain (Diano 
et al., 2000; Horvath et al., 1999; Richard et al., 1998). However, 
UCP2 is induced in many brain sites in response to cellular injury 
inflicted by physical insults (Bechmann et al., 2002), epileptic sei- 



zures (Diano et al., 2003), or ischemia (Deierborg et al., 2008). 
These seminal observations, together with studies unmasking 
the bidirectional regulatory relationship between UCP2 and 
ROS (Arsenijevic et al., 2000; Echtay et al., 2002; Negre-Salvayre 
et al., 1997), underscore the potential importance of this mito- 
chondrial protein in mtROS control during cellular stress. Pursuit 
of the physiological functions of UCP2 in normal brain gave 
further support for this notion. 

Initial studies in normal brain implicated UCP2 protein expres- 
sion in specific subpopulations of neurons that control hunger, 
energy expenditure, glucose metabolism, and circadian rhythms 
(Horvath et al., 1999). Simultaneously, but independently, the 
sites of action of the hunger-promoting peripheral hormone, 
ghrelin (Cowley et al., 2003), revealed virtually complete overlap 
of ghrelin action in the brain and UCP2 expression. Eventually it 
became clear that most cells in the brain that express ghrelin re- 
ceptors also express UCP2 (Andrews et al., 2008). These obser- 
vations spurred the interrogation of whether the influence of 
ghrelin on appetite and food intake is mediated by UCP2 and, 
if so, via what cellular and intercellular mechanisms (Andrews 
et al., 2008; Diano and Horvath, 2012; Horvath et al., 2009). 
Through these investigations, the following chain of events was 
uncovered regarding UCP2, mtROS, and neuronal activity 
(Figure 2): (1) ghrelin induces NPY/AgRP neuronal firing via 
activation of its receptor, GHSR (growth hormone secretogouge 
receptor), which in turn activates AMP kinase (AMPK); (2) AMPK 
activation suppresses acetyl CoA carboxylase (ACC) activity, 
eliminating the inhibitory effect of malonyl-CoA on carnitine 
palmitoyl transferase 1 (CPT1) activity; (3) CPT1 activation 
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Figure 3. Schematic Illustration of Hypothalamic Control of Positive Energy Metabolism with Elevated ROS 

(A) Satiety (feeling full) is promoted by hypothalamic neurons (beige) that produce pro-opiomelanocortin (POMC)-derived peptides, such as a-MSH, which, in 
turn, act on melanocortin-4-receptor-containing neurons. When these neurons are active, systemic metabolism is shifting toward glucose utilization, with 
enhanced mtROS production contributing to increased cellular ROS in various tissues. 

(B) The activation of POMC neurons after a meal is accomplished by ROS, in part driven by increased mtROS production, and is supported by intracellular leptin 
(Jak/Stat) and insulin (PI-3K/PTEN) signaling, involving altered K-ATP channel activity. 

(0) Recent studies indicate that, under unique circumstances (e.g., activation of cannabinoid receptors; mCB1 R), POMC neurons, while still driven by ROS, will 
become promoters of hunger rather than satiety because they shift to releasing appetite-stimulating opiates (p-endorphin) though UCP2-dependent mito- 
chondrial adaptations. 



enhances long-chain fatty acid oxidation by mitochondria and 
the generation of mtROS; (4) ROS, together with fatty acids, 
promotes UCP2 gene transcription and activity; (5) UCP2, via 
enhancing proton leak, tempers mtROS production, allowing 
continuous fatty acid oxidation without oxidative stress burden 
and transcription of genes that promote mitochondrial biogen- 
esis and activity (e.g., NRF1), enabling continuous support of 
the bioenergetic needs of sustained firing of NPY/AgRP cells; 
and (6) activity of NPY/AgRP neurons results in activity-depen- 
dent synaptic plasticity and inhibition of POMC neurons. The 
intracellular signaling of AgRP neurons is supportive of neuronal 
activation and decreases vulnerability of these neurons to 
cellular stress. It was also suggested that the long-chain fatty 
acyl CoAs utilized under these conditions arise from the periph- 
ery under the control of the hypothalamic NPY/AgRP neurons 
(Andrews et al., 2008). Knocking out UCP2 diminished the ability 
of AgRP neurons to inhibit mtROS production and maintain low 
levels of cellular ROS, which, in turn, impairs their neuronal func- 
tions (Andrews et al., 2008). Consistent with this model, cellular 
and electric activity of AgRP neurons is restored, together with 
reversal of impaired feeding behavior, when a ROS-scavenging 
cocktail containing L-cysteine is infused into the parenchyma 
of the hypothalamus (Andrews et al., 2008). These results 
strongly indicate that behaviors associated with low-energy 
availability hinge significantly on mtROS signaling associated 
with lipid metabolism in key neurons that drive these organismal 
adaptations. However, we recognize that UCP2 likely controls 



mtROS indirectly, via alteration of mitochondrial fuel utilization, 
and that other consequences of UCP2 activity in regulating 
metabolism may also be important (Andrews et al., 2008; Diano 
and Horvath, 2012; Horvath et al., 2009; Pecqueur et al., 2009; 
Vozza et al., 2014). 

The studies outlined above also suggested the exact opposite 
scenario for those neurons that support cessation of eating once 
enough food is consumed (satiety). The hypothalamic neurons 
that produce pro-opiomelanocortin (POMC) and related pep- 
tides are located in the same area as the hunger-promoting 
AgRP neurons that, when active, suppress POMC neuronal ac- 
tivity. POMC neurons appear to have elevated ROS, as indicated 
by intracellular dihydroethidium (DHE) staining (Andrews et al., 
2008; Diano et al., 201 1 ), when they are active to promote satiety 
and increased energy expenditure after food consumption 
(Figure 3). Elevated mtROS production is a logical POMC 
neuronal activator, since it would likely correlate with mitochon- 
drial activation during full oxidation of glucose, the main fuel of 
these neurons (Parton et al., 2007). When ROS levels are sup- 
pressed chemically, POMC neurons are hyperpolarized and their 
firing rate declines (Diano et al., 2011). Conversely, when in-slice 
preparations of POMC neurons are exposed to hydrogen 
peroxide, they become depolarized and their firing rate is 
elevated (Diano et al., 2011). These results indicate that it is actu- 
ally mtROS, rather than glucose itself, that instigate POMC 
neuronal firing (Diano et al., 201 1 ; Long et al., 201 4). Because hy- 
pothalamic POMC neurons are involved in both behavioral and 



564 Cell 163 , October 22, 2015 ©2015 Elsevier Inc. 




Cell 



autonomic control of energy and glucose metabolism, it is not 
surprising that hypothalamic ROS control has been tied to all 
of these processes (Figure 3). 

ROS are Satiety Signals 

From a behavioral and systemic perspective, under normal 
conditions, elevated hypothalamic ROS levels are permissive 
for suppression of eating, increased energy expenditure, and 
glucose utilization by peripheral tissues (Andrews et al., 
2008; Benani et al., 2007; Diano et al., 2011; Leloup et al., 
2006; Long et al., 2014). In fact, it is reasonable to conclude 
that ROS signaling in the brain, as well as in the periphery, is 
fundamental for appropriate behavioral and autonomic adapta- 
tions to energy surplus (e.g., after consumption of a meal). If 
one interferes with ROS in these physiological processes, 
both behavioral and autonomic correlates of proper fuel man- 
agement of the body will be impaired (Andrews et al., 2008; 
Diano et al., 2011; Horvath et al., 2009; Long et al., 2014). 
Since mitochondrial perturbations via UCP2 modulate these 
responses, we conclude that mtROS signaling plays a funda- 
mental and crucial role in physiological regulation of systemic 
metabolism. 

The relentless pursuit of available energy sources in the envi- 
ronment is mandatory for organismal survival, and hence, hun- 
ger and hunger-controlled circuits motivate this critical behavior. 
However, advanced animal species, including humans, also 
developed the capacity to maintain energy reserves, for 
example, in the form of fat or glycogen, so they do not have to 
continuously feed to survive. This evolutionary advantage, by 
default, demands that “gauges” and “switches” are in place to 
shut off feeding when storage capacity is sufficient. The role of 
the hypothalamic melanocortin system appears to represent 
both the gauge and the switch, with ROS, likely driven by mtROS 
production, being the sensor. When cellular ROS reach a 
threshold, they activate POMC neurons, enabling the cessation 
of feeding and initiation of storage and utilization of fuels via pro- 
cesses controlled by insulin and leptin (Varela and Horvath, 
2012). The control of insulin release itself is regulated by redox 
events (Bashan et al., 2009). At the same time, elevating ROS 
in the hypothalamus reduces activity of AgRP neurons that prop- 
agate hunger (Andrews et al., 2008) (Figure 2). While it remains 
unknown what underlies the differential effect of ROS on 
POMC and AgRP neurons, cell-specific expression of plasma 
membrane channels could be involved. For example, superoxide 
can directly alter the activity of potassium channels (Avshalumov 
and Rice, 2003), which play important roles in glucose sensing 
by hypothalamic neurons and under regulatory control by 
UCP2 (Parton et al., 2007). Likewise, it remains unclear how 
mtROS signals reach neuronal perikarya. In this regard, it is note- 
worthy that mitochondria in both AgRP and POMC neurons 
dynamically change their morphology and localization within a 
short period of time (Andrews et al., 2008; Coppola et al., 
2007; Dietrich et al., 201 3; Schneeberger et al., 201 3). Mitochon- 
drial fission and fusion capacity and the physical interaction 
between mitochondria and the endoplasmic reticulum may be 
involved in mtROS production or associated signaling events 
(Dietrich et al., 2013; Nasrallah and Horvath, 2014; Schnee- 
berger et al., 2013). 



It is important to note that ROS are likely not the satiety signal 
in the hypothalamus under all circumstances. For example, the 
known effect of cannabinoids on promoting ferocious appetite, 
regardless of metabolic satiety, is actually mediated by 
mtROS-driven POMC neurons (Koch et al., 2015). However, un- 
der cannabinoid influence, POMC neurons reverse their function 
and promote appetite because they switch from the release of its 
satiety-promoting neuropeptide, a-melanocyte-stimulating hor- 
mone (a-MSH), to p-endorphin (Koch et al., 2015). This switch 
is enabled by a mitochondrial adaptive response controlled by 
UCP2 (Koch et al., 2015) (Figure 30). Whether this response is 
specific for this pharmacological situation or relevant to regula- 
tion of metabolism under certain physiological circumstances 
is unknown. 

Mitochondrial ROS and Exercise 

A key feature of animal species is their need to physically relo- 
cate in a rapid and predictable fashion to find food, reproduce, 
or escape danger. This is accomplished, in grosso modo, 
through behavioral adaptations, which are the sum of coordi- 
nating sensory and effector functions via the communication be- 
tween the nervous system (the sensory component) and the 
musculoskeletal system (the effector component). Movement 
and exercise, in general, have long been recognized as key to 
supporting not only the aforementioned fundamental biological 
needs, but also tissue health and longevity. Ristow and col- 
leagues showed that suppressing ROS generation during aero- 
bic exercise (likely, in large part, mtROS) diminishes beneficial 
outcomes on many exercise-related parameters (Ristow et al., 
2009). They also argue for the critical relevance of mtROS 
signaling transients as important contributors to longevity, as 
well as mediators of other signaling responses that promote 
healthspan and longevity (Schmeisser et al., 2013; Zarse et al., 
2012). In support of the notion that mtROS transients (and not 
sustained elevated ROS levels) mediate the benefits of exercise 
on integrative physiology, UCP2 was found crucial to support 
exercise-induced synaptogenesis in the dentate gyrus of the hip- 
pocampal formation, a key site of spatial learning (Dietrich et al., 
2008). This same mechanism is also relevant to hippocampal 
development (Simon-Areces et al., 2012) and lifespan determi- 
nation (Andrews and Horvath, 2009). 

Short- and Long-Term Effects of ROS 

The distinction between physiologically beneficial short ROS 
bursts and pathological, sustained high ROS levels on cellular 
and circuit integrity is best illustrated in the response of the hypo- 
thalamus to exposure to calorie-dense diets containing high 
levels of fats and carbohydrates (Diano et al., 2011; Parton 
et al., 2007). On regular chow diet, which contains <20% fat, 
ROS levels fluctuate between hunger and satiety states and 
mice maintain a positive correlation between hypothalamic 
ROS levels, circulating leptin levels (the adipose hormone that 
signals to the hypothalamus when sufficient amount of food is 
consumed), and activity of POMC neurons (Diano et al., 2011). 
However, when animals are placed on calorie-dense diets 
(>40% fat), homeostatic control of energy metabolism is gradu- 
ally deregulated. When this occurs, animals steadily increase 
their fat stores, which results in steady elevation of circulating 
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leptin. Under homeostatic conditions, this elevated leptin would 
decrease feeding and enable maintenance of fat stores. This 
inability of elevated leptin levels to avoid or reverse weight gain 
is called leptin resistance, for which many cellular and tissue 
mechanisms have been proposed to explain. One of these 
mechanisms relates to the aforementioned mtROS control in 
POMC and AgRP neurons. On a high-fat diet, the positive corre- 
lation between hypothalamic ROS, circulating leptin levels, and 
POMC neuronal activity in mice deteriorates (Diano et al., 
2011). Under these conditions, hypothalamic ROS levels plateau 
and do not follow the robust and steady elevation in circulating 
leptin concentrations. At the same time, POMC neuronal activity 
is diminished (Diano et al., 2011). The underlying cause for this 
dysregulation is tied to PPARy-related proliferation of peroxi- 
somes and increased ROS elimination by catalase (Diano 
et al., 2011; Long et al., 2014). We suggest that leptin affects 
this process by enabling increased glucose uptake in POMC 
and AgRP neurons, which is accompanied by increased lipid 
load. Initially, this will lead to increased mtROS production and 
crossing cellular ROS thresholds that promote satiety by acti- 
vating POMC neurons. However, rising leptin levels on high-fat 
diet will continue to promote glucose uptake by these neurons, 
which, together with increasing lipid load, will overburden these 
postmitotic cells. The scenario in which lipid and carbohydrate 
load is increasing in cells provides a perfect energetic basis for 
growth. Carbohydrate oxidation will dominate mitochondrial 
OXPHOS and ATP generation, while long-chain fatty acids are 
diverted from mitochondria via the malonyl-CoA shuttle for 
biogenesis of membranes and cell growth. However, in cells 
whose growth is strictly limited, such as neurons of the adult cen- 
tral nervous system, lipids cannot be continuously utilized for 
membrane biogenesis and they will accumulate in various intra- 
cellular compartments, including the endoplasmic reticulum. 
The activation of peroxisome proliferation under these circum- 
stances provides an alternative mechanism through which 
excess fat within cells can be eliminated via mitochondrial p 
oxidation that is less coupled to ATP generation. While this is a 
beneficial process to prevent lipotoxicity, peroxisomal catalase 
activity will limit ROS generation needed for proper signaling 
from the hypothalamus to diminish feeding and increase energy 
expenditure (Diano et al., 2011). These alterations may have mul- 
tiple negative effects on cellular signaling mechanisms and 
organelle integrity and function. 

A “Fuel Hypothesis” of Cellular Function 

While the above details on mtROS-related mechanisms were 
described in relation to a hypothalamic circuit that controls 
feeding behavior and peripheral fuel partitioning and utilization, 
these processes are likely relevant to the functionality and 
impairment of neurons in various parts of the brain. For example, 
UCP2-dependent control of mtROS was also found in dopamine 
neurons in the midbrain substantia nigra, where both normal 
functioning of these cells and their protection under cellular 
stress were attributed to this mechanism (Andrews et al., 2005, 
2006; Conti et al., 2005). Dopamine cells in this area of the brain 
are connected to control of fine motor functions and complex 
motivated behaviors. A role for the peripheral metabolic hor- 
mone, ghrelin, was identified to modulate the activity of these 



neurons and to prevent their impairment and death in models 
of Parkinson’s disease (Abizaid et al., 2006; Andrews et al., 

2009) . The intracellular signaling pathway that enabled these 
beneficial effects of ghrelin was related to ROS control by the 
same machinery as described above in relation to the control 
of feeding (Andrews et al., 2008). Similar ghrelin action was 
also found in the hippocampus to promote learning and memory 
and to ameliorate deficits of animals in a model of Alzheimer’s 
disease (Diano et al., 2006). Because the changing metabolic 
state (hunger satiety) is closely tied to predictable changes 
in complex behaviors, it is reasonable to suggest that fluctuating 
ROS levels (mediated by alterations in mtROS output) in all or 
part of the brain play a critical regulatory role in the synchroni- 
zation of neuronal circuit activity in support of continuous 
and appropriate behavioral adaptations. Furthermore, mtROS- 
controlled neuronal activity in the hypothalamus is sufficient to 
affect complex behaviors beyond feeding. For example, actute 
activation of hypothalamic AgRP neurons rapidly alters stereo- 
typic behaviors, locomotion, and anxiety (Dietrich et al., 2015). 
Finally, we speculate that purposeful alterations in mtROS pro- 
duction to effect cellular redox signaling pathways (Figure 1) 
will regulate homeostasis in other tissues. In simple terms, it is 
reasonable to assume that cellular functions in any tissue are 
determined by fuel availability, uptake, and utilization and that 
these “fuel” principles drive and orchestrate signaling modal- 
ities, including mtROS signaling, to control homeostatic and 
adaptive responses. For example, intracellular metabolic path- 
ways have distinct and dominant impacts on various immune 
cell types (Caro-Maldonado et al., 2012; Procaccini et al., 

2010) , and UCP2-dependent mtROS regulation is connected 
to both adaptive and innate immune cell functions (Arsenijevic 
et al., 2000; Horvath et al., 2003; Krauss et al., 2002). How other 
cells and tissues respond to such signals and the intersection 
between cell-intrinsic signaling and control by the CNS is an 
exciting area of future research. In this regard, determining 
whether the aforementioned CNS processes are mediated by 
cell-non-autonomous mtROS-mediated signals similar to those 
documented in C. elegans downstream of ETC disruption and 
mtROS (Durieux et al., 2011; Schieber and Chandel, 2014) will 
be important to consider. 

Challenges in ROS Research and Ramifications of ROS 
as Central Controllers of Organismal Homeostasis 

In this Review, we have summarized how changes in mtROS 
production can impact cellular ROS thresholds and redox 
signaling events that control basal physiological functions and 
adaptive responses. As such, we argue that these pathways 
are critical for organismal homeostasis, stress responsiveness, 
health, and longevity. However, these pathways are far from un- 
derstood and are in need of more intensive study. Some current 
impediments and other relevant considerations as the field 
moves forward in this area are covered below. 

At present, it remains very difficult to effectively measure ROS 
in cells and in vivo. The use of commercially fluorescent ROS 
probes is widespread but often without the knowledge that these 
do not always readout specific ROS species faithfully and are 
prone to other confounding artifacts (Kalyanaraman et al., 
2012). That is not to say that these are not useful to a degree. 
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but they should not be used as the only line of evidence to impli- 
cate ROS in a response. Development of better ROS assays and 
probes is ongoing (Ezeripa et al., 201 4; Logan et al., 201 4; Wool- 
ley et al., 201 3), yet there remains a great need for additional for- 
ward progress in this important area. 

There are important implications for ROS as physiological 
signaling molecules that impact therapeutic strategies. Two 
that immediately come to mind are the use of antioxidants and 
anti-obesity strategies that target the CNS. Antioxidants have 
long been considered potential therapeutics for a number of con- 
ditions involving oxidative stress. In general, trials using these 
have failed, likely, in part, because of unintentional inhibition of 
important basal and adaptive ROS signaling pathways. It has 
even been argued that taking antioxidants as daily dietary sup- 
plements might also perturb these ROS pathways in ways that 
are not beneficial or even harmful (Ristow, 2014). Similarly, stra- 
tegies that target activation of POMC neurons (to promote 
satiety) as an anti-obesity/anti-diabetic strategy might also be 
confounded due to perturbation of ROS signaling circuits that 
we have described herein (Dietrich and Horvath, 2012). That is, 
we assert that many compounds that activate POMC neurons 
will, by default, upregulate ROS production and signaling that 
governs their activity (Diano et al., 2011). If this scenario is main- 
tained for a prolonged period of time (hours, days, months), 
weight loss may be accomplished, but sustained ROS levels 
could have a multitude of unintended detrimental consequences. 
As these examples point out, previously held views of ROS as just 
damaging agents that need to be eliminated are out of date, and a 
new appreciation of their signaling roles is important to consider 
going forward. The fact that mitochondria are major producers of 
ROS also highlights the importance of better understanding what 
controls their activity and rate of mtROS production, both in terms 
of redox signaling and oxidative stress. In this regard, the greater 
recent appreciation of mitochondria as important signaling hubs 
(Chandel, 2014; West et al., 2011) has begun to transcend older, 
over-simplified views of these organelles as just sites of interme- 
diary metabolism and ATP production. 
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SUMMARY 

The bacteria Yersinia pestis is the etiological agent 
of plague and has caused human pandemics with 
millions of deaths in historic times. How and 
when it originated remains contentious. Here, we 
report the oldest direct evidence of Yersinia pestis 
identified by ancient DNA in human teeth from Asia 
and Europe dating from 2,800 to 5,000 years ago. 
By sequencing the genomes, we find that these 
ancient plague strains are basal to all known 
Yersinia pestis. We find the origins of the Yersinia 
pestis lineage to be at least two times older than 
previous estimates. We also identify a temporal 
sequence of genetic changes that lead to increased 
virulence and the emergence of the bubonic 
plague. Our results show that plague infection 
was endemic in the human populations of Eurasia 
at least 3,000 years before any historical recordings 
of pandemics. 

CrossMark 



INTRODUCTION 

Plague is caused by the bacteria Yersinia pestis and is being 
directly transmitted through human-to-human contact (pneu- 
monic plague) or via fleas as a common vector (bubonic or septi- 
cemic plague) (Treille and Yersin, 1894). Three historic human 
plague pandemics have been documented: (1 ) the First Pandemic, 
which started with the Plague of Justinian (541-544 AD), but 
continued intermittently until ~750 AD; (2) the Second Pandemic, 
which began with the Black Death in Europe (1347-1351 AD) and 
included successive waves, such as the Great Plague (1 665-1 666 
AD), until the 1 8*^ century; (3) the Third Pandemic, which emerged 
in China in the 1 850s and erupted there in a major epidemic in 1 894 
before spreading across the world as a series of epidemics until 
the middle of the 20^^ century (Bos et al., 201 1 ; Cui et al., 2013; 
Drancourt et al., 1998; Harbeck et al., 2013; Parkhill et al., 2001; 
Perry and Fetherston, 1997; Wagner etal., 201 4). Earlier outbreaks 
such as the Plague of Athens (430^27 BC) and the Antonine 
Plague (165-180 AD) may also have occurred, but there is no 
direct evidence that allows confident attribution to Y. pestis (Dran- 
court and Raoult, 2002; McNeill, 1976). 
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The consequences of the plague pandemics have been well- 
documented and the demographic impacts were dramatic (Little 
et al., 2007). The Black Death alone is estimated to have killed 
30%-50% of the European population. Economic and political 
collapses have also been in part attributed to the devastating 
effects of the plague. The Plague of Justinian is thought to 
have played a major role in weakening the Byzantine Empire, 
and the earlier putative plagues have been associated with the 
decline of Classical Greece and likely undermined the strength 
of the Roman army. 

Molecular clock estimates have suggested that Y. pestis diver- 
sified from the more prevalent and environmental stress-tolerant, 
but less pathogenic, enteric bacterium Y. pseudotuberculosis be- 
tween 2,600 and 28,000 years ago (Achtman et al., 1999, 2004; 
Cui et al., 2013; Wagner et al., 2014). However, humans may 
potentially have been exposed to Y. pestis for much longer than 
the historical record suggests, though direct molecular evidence 
for Y. pestis has not been obtained from skeletal material older 
than 1,500 years (Bos et al., 2011; Wagner et al., 2014). The 
most basal strains of Y. pestis (0.PE7 clade) recorded to date 
were isolated from the Qinghai-Tibet Plateau in China in 1961- 
1962 (Cuiet al.,2013). 

We investigated the origin of Y. pestis by sequencing ancient 
bacterial genomes from the teeth of Bronze Age humans across 
Europe and Asia. Our findings suggest that the virulent, flea- 
borne Y. pestis strain that caused the historic bubonic plague 
pandemics evolved from a less pathogenic Y. pestis lineage in- 
fecting human populations long before recorded evidence of 
plague outbreaks. 

RESULTS 

Identification of Yersinia pestis in Bronze Age Eurasian 
Individuals 

We screened c. 89 billion raw DMA sequence reads obtained 
from teeth of 101 Bronze Age individuals from Europe and Asia 
(Allentoft et al., 201 5) and found that seven individuals carried se- 
quences resembling Y. pestis (Figure 1 , Table SI , Supplemental 
Experimental Procedures). Further sequencing allowed us to 



Figure 1. Archaeological Sites of Bronze 
Age Yersinia pestis 

(A) Map of Eurasia indicating the position, radio- 
carbon dated ages and associated cultures of the 
samples in which Y. pestis were identified. Dates 
are given as 95% confidence interval calendar BC 
years. I A: Iron Age. 

(B) Burial four from Bulanovo site. Picture by 
Mikhail V. Khalyapin. See also Table S1 . 



assemble the Y. pestis genomes to an 
average depth of 0.1 4-29. 5X, with 12%- 
95% of the positions in the genome 
covered at least once (Table 1 , Table S2, 
S3, and S4). We also recovered the 
sequences of the three plasmids pCDI , 
pMTI, and pPCPI (0.12 to 50.3X in 
average depth) the latter two of which 
are crucial for distinguishing Y. pestis from its highly similar 
ancestor Y. pseudotuberculosis (Table 1, Figure 2, Table S3) 
(Bercovier et al., 1980; Chain et al., 2004; Parkhill et al., 2001). 
The host individuals from which Y. pestis was recovered belong 
to Eurasian Late Neolithic and Bronze Age cultures (Allentoft 
et al., 201 5), represented by the Afanasievo culture in Altai, Sibe- 
ria (2782 cal BC, 2794 cal BC, n = 2), the Corded Ware culture in 
Estonia (2462 cal BC, n = 1 ), the Sintashta culture in Russia (21 63 
cal BC, n = 1), the Unetice culture in Poland (2029 cal BC, n = 1), 
the Andronovo culture in Altai, Siberia (1686 cal BC, n = 1), and 
an early Iron Age individual from Armenia (951 cal BC, n = 1) 
(Table SI). 

Authentication of Yersinia pestis Ancient DMA 

Besides applying standard precautions for working with ancient 
DNA (Willerslev and Cooper, 2005), the authenticity of our 
findings are supported by the following observations: (1) The 
Y. pestis sequences were identified in significant amounts in 
shotgun data from eight of 101 samples, showing that this 
finding is not due to a ubiquitous contaminant in our lab or in 
the reagents. Indeed, further analysis showed that one of these 
eight was most likely not Y. pestis. We also sequenced all nega- 
tive DNA extraction controls and found no signs of Y. pestis DNA 
in these (Table S3). (2) Consistent with an ancient origin, the 
Y. pestis reads were highly fragmented, with average read 
lengths of 43-65 bp (Table S3) and also displayed clear signs of 
C-T deamination damage at the 5' termini typical of ancient 
DNA (Figure 3, Figure SI). Because the plasmids are central for 
discriminating between Y. pestis and Y. pseudotuberculosis, 
we tested separately for DNA damage patterns for the chromo- 
some and for each of the plasmids. For the seven samples, we 
observe similar patterns of DNA damage for chromosome and 
plasmid sequences (Figure 3, Figure SI). (3) We observe corre- 
lated DNA degradation patterns when comparing DNA degra- 
dation in the Y. pestis sequences and the human sequences 
from the host individual. Given that DNA decay can be described 
as a rate process (Allentoft et al., 2012), this suggests that the 
DNA molecules of the pathogen and the human host have a 
similar age (Figure 3, Figure SI, Table S3 and Supplemental 
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Table 1. Overview of the Y. pestis Containing Samples 



Sample 


Country 


Site 


Culture 


Date (cal BC) 


C092 


pMTI 


pPCPI 


pCDI 


RISEOO 


Estonia 


Sope 


Corded Ware 


2575-2349 


0.39 


0.36 


1.40 


0.66 


RISE139 


Poland 


Chociwel 


Unetice 


2135-1923 


0.14 


0.24 


0.76 


0.28 


RISE386 


Russia 


Bulanovo 


Sintashta 


2280-2047 


0.82 


0.96 


1.12 


1.60 


RISE397 


Armenia 


Kapan 


EIA 


1048-885 


0.25 


0.40 


6.88 


0.50 


RISE505 


Russia 


Kytmanovo 


Andronovo 


1746-1626 


8.73 


9.15 


34.09 


17.46 


RISE509 


Russia 


Afanasievo Gora 


Afanasievo 


2887-2677 


29.45 


16.96 


31.22 


50.32 


RISE511 


Russia 


Afanasievo Gora 


Afanasievo 


2909-2679 


0.20 


0.24 


1.19 


0.60 



The dating is direct AMS dating of bones and teeth and is given as 95% confidence interval calendar BC years (details are given in Table S1). The 
columns C092, pMT1 , pPCP1 and pCD1 correspond to sequencing depth. Additional information on the archaeological sites and mapping statistics 
can be found in the Supplemental Experimental Procedures and Table S1, S2, and S3. EIA: Early Iron Age, AMS: Accelerator Mass Spectrometry. 



Experimental Procedures). (4) Because of the high sequence 
similarity between Y. pestis and Y. pseudotuberculosis, we 
mapped all reads both to the Y. pestis C092 and to the 
Y. pseudotuberculosis IP32953 reference genomes (Chain 
et al., 2004). Consistent with being Y. pestis, the seven investi- 
gated samples displayed more reads matching perfectly (edit 
distance = 0) toward Y. pestis (Figure 3, Figure S2). One sample 
(RISE392) was most likely not Y. pestis based on this criterion. 
(5) A naive Bayesian classifier trained on known genomes pre- 
dicts the seven samples to be Y. pestis with 1 00% posterior prob- 
ability, while RISE392 is predicted to have 0% probability of 
being Y. pestis (Figure S2, Table S3). (6) If the DNA was from 
other organisms than Y. pestis, we would expect the reads to 
be more frequently associated with either highly conserved or 
low-complexity regions. However, we find the reads to be distrib- 
uted across the entire genome (Figure S2), and comparison of 
actual coverage versus the coverage that would be expected 
from read length distributions and mappability of the reference 
sequences are also in agreement for the seven samples (Figure 3). 
(7) In a maximum likelihood phylogeny, the recovered Y. pestis 
genomic sequences of RISE505 and RISE509 are clearly within 
the Y. pestis clade and basal to all contemporary Y. pestis strains 
(Figure 4) (see below). 

The Phylogenetic Position of the Bronze Age Yersinia 
pestis Strains 

To determine the phylogenetic positions of the two high 
coverage ancient Y. pestis strains, RISE505 (Andronovo culture 
1686 cal BC, 8.7X) and RISE509 (Afanasievo culture, 2746 cal 
BC, 29. 7X), we mapped the reads, together with reads from 
strains of Yersinia similis (n = 5), Y. pseudotuberculosis (n = 
25), and Y. pestis (n = 139), to the Y. pseudotuberculosis refer- 
ence genome (IP32953). Only high confidence positions were 
extracted. To assess whether the individuals were infected 
with multiple strains of Y. pestis we investigated the genotype 
heterozygosity levels of the ancient genomes and found no 
indications of mixed infection (Figure S3). There was no decay 
in Linkage Disequilibrium (LD) across the chromosome (Fig- 
ure S3), indicating no detectable recombination among strains. 
We therefore used RAxML (Stamatakis, 2014) to construct a 
Maximum Likelihood phylogeny from a supermatrix concate- 
nated from 3,141 genes and a total of 3.14 Mbp (Figure 4). This 
contrasts with earlier phylogenies (Bos et al., 2011; Cui et al.. 



201 3; Morelli et al., 201 0; Wagner et al., 201 4), which were based 
on less than 2,300 nucleotides that were ascertained to be vari- 
able in Y. pestis, likely leading to lower statistical accuracy than 
with whole-genome analyses. Furthermore, the use of SNPs 
ascertained to be variable in Y. pestis would downwardly bias 
estimates of branch lengths in Y. pseudotuberculosis and lead 
to underestimates of the Y. pestis versus Y pseudotuberculosis 
divergence time, as seen in the branch length of the Y. pestis 
clade to Y. pseudotuberculosis (Figure S3). The topology of our 
whole genome tree shows Y. pestis as a monophyletic group 
within Y. pseudotuberculosis with RISE505 and RISE509 (Fig- 
ure 4A, black arrow. Figure S4) clustered together within the 
Y. pestis clade. The Y. pestis sub-tree topology (Figure 4B, Fig- 
ure S4) is similar to that reported previously (Bos et al., 2011; 
Cui et al., 2013; Morelli et al., 2010; Wagner et al., 2014), but 
with the two ancient strains (RISE505 and RISE509) falling basal 
to all other known strains of Y. pestis (100% bootstrap support). 

Determination of Yersinia pestis Divergence Dates 

To determine the dates for the most recent common ancestor 
(MRCA) of Y. pestis and Y. pseudotuberculosis, and for all known 
Y. pestis strains, we used a Bayesian Markov Chain Monte Carlo 
approach implemented in BEAST2 (Bouckaert et al., 2014) on a 
subset of the supermatrix. We estimated the MRCA of Y. pestis 
and Y. pseudotuberculosis to be 54,735 years ago (95% HPD 
[highest posterior density] interval: 34,659-78,803 years ago) 
(Figure 4C, Figure S5, Table S5), which is about twice as old 
compared to previous estimates of 2,600-28,000 years ago 
(Achtman et al., 1999, 2004; Cui et al., 2013; Wagner et al., 
2014). Additionally, we estimated the age of the MRCA of all 
known Y. pestis to 5,783 years ago (95% HPD interval: 5,021- 
7,022 years ago). This is also significantly older and with a 
much narrower confidence interval than previous findings of 
3,337 years ago (1 ,505-6,409 years ago) (Cui et al., 2013). 

Bronze Age Yersinia pestis Strains Lacking Yersinia 
Murine Toxin 

For the high-depth ancient Y. pestis genomes, we investigated 
the presence of 55 genes that have been associated with the 
virulence of Y. pestis (Figure 5A, Table S6). We found all virulence 
genes to be present, except the Yersinia murine toxin (ymt) gene 
that is located at 74.4-76.2 kb on the pMTI plasmid (Figure 2C, 
arrow 1). The ymf gene encodes a phospholipase D that protects 
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Figure 2. Y. pestis Depth of Coverage Plots 

(A-D) Depth of coverage plots for (A) C092 chromosome, (B) pCD1 , (C) pMT1 , (D) pPCP1 . Outer ring: Mappability (gray), genes (RNA: black, transposon: purple, 
positive strand: blue, negative strand: red), RISE505 (blue), RISE509 (blue), Justinian plague (orange). Black Death plague (purple), modern Y. pestis D1 982001 
(green), Y. pseudotuberculosis IP32881 (red) sample. The modern Y. pestis and Y. pseudotuberculosis samples are included for reference. The histograms show 
sequence depth in 1 kb windows for the chromosome and 100 bp windows for the plasmids with a max of 20X depth for each ring. Arrow 1 : ymt gene, arrow 2: 
transposon at start of missing region on pMTI , arrow 3: transposon at end of missing region on pMTI , arrow 4: pla gene, arrow 5: missing flagellin region on 
chromosome. The plots were generated using Circos (Krzywinski et al., 2009). See also Tables S2, S3 and S8. 



Y. pestis inside the flea gut, thus enabling this enteric bacteria 
to use an arthropod as vector; it further allows for higher titers 
of Y. pestis and higher transmission rates (Hinnebusch, 2005; 
Hinnebusch et al., 2002). When investigating all seven samples 
for the presence oiyirit, we identified a 19 kb region (59-78 kb. 
Figure 2C arrow 2-3, Figure 5B) to be missing except in the youn- 
gest sample (RISE397, 951 cal BC) (Figure 5B, Table S7). We find 
this region to be present in all other published Y. pestis strains 



(modern and ancient), except three strains (5761, 945, and 
CA88) that are lacking the pMTI plasmid completely. 

Although larger sample sizes are needed for confirmation, our 
data indicate that the ymt gene was not present in Y. pestis 
before 1686 cal BC (n = 6), while after 951 cal BC, it is found in 
97.8% of the strains (n = 140), suggesting a late and very rapid 
spread of ymt. This contrasts with previous studies arguing 
that the ymt gene was acquired early in Y. pestis evolution due 
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to its importance in its life cycle (Carniel, 2003; Hinnebusch, 
2005; Hinnebusch et al., 2002; Sun et al., 2014). Interestingly, 
we identified two transposase elements flanking the missing 
1 9 kb region, confirming that the ymf gene was acquired through 
horizontal gene transfer, as previously suggested (Lindler et al., 
1998). Moreover, it has recently been shown that the transmis- 
sion of Y. pestis by fleas is also dependent on loss of function 
mutations in the pde2, pde3, and rcsA genes (Sun et al., 2014). 
The RISE509 sample carries the promoter mutation of pde3 
and the functional pde2 and rcsA alleles (Figure S6). In combina- 
tion with the absence of ymf, these results strongly suggest that 
the ancestral Y. pestis bacteria in these early Bronze Age individ- 
uals were not transmitted by fleas. 

Native Plasminogen Activator Gene Present in Bronze 
Age Yersinia pestis 

Another hallmark gene of Y. pestis pathogenicity is the plas- 
minogen activator gene pia (omptin protein family), located on 
the pPCPI plasmid (6. 6-7. 6 kb). The gene facilitates deep tissue 
invasion and is essential for development of both bubonic and 
pneumonic plague (Sebbane et al., 2006; Sodeinde et al., 
1992; Zimbler et al., 2015). We identify the gene in six of the 
seven genomes, but not in RISE139, the sample with the lowest 
overall depth of coverage (0.75X on pPCPI) (Figure 2D, arrow 4, 
Table S6). Recently, it has been proposed that pPCPI was 
acquired after the branching of the 0.PE2 clade (Zimbler et al., 
2015); however, we identified pPCPI in our samples, including 
in the 0.PE7 clade (strains 620024 and CMCC05009), which 
diverged prior to the common ancestor of the 0.PE2 lineage (Fig- 
ure 4B, Figure 5A). This shows that pPCPI and pia likely were 
present in the most basal Y. pestis (RISE509), suggesting that 
the 0.PE2 strains lost the pPCPI plasmid. Interestingly, three 
2.ANT3 strains (5761 , CMCC64001 , and 735) are also missing 
the pia gene, indicating that the loss of pPCPI occurred more 
than once in the evolutionary history of Y. pestis. 

Additionally, we investigated whether RISE397, RISE505, and 
RISE509 had the isoleucine to threonine mutation at amino acid 
259 in the Pia protein. This mutation has been shown to be 
essential for developing bubonic, but not pneumonic, plague 
(Zimbler et al., 201 5). We found that these samples, in agreement 
with their basal phylogenetic position, carry the ancestral isoleu- 
cine residue. However, we also identified a valine to isoleucine 
mutation at residue 31 for RISE505 (1686 cal BC) and RISE509 
(2746 cal BC). This mutation was not found in any of the other 
140 Y. pestis strains, but was present in other omptin proteins, 
such as Escherichia coii and Citrobacter koseri, and very likely 
represents the ancestral Y. pestis state. The youngest of the 
samples, RISE397 (951 cal BC) carries the derived isoleucine 
residue, showing that this mutation, similar to the acquisition of 
ymt, was only observed after 1686 cal BC. 

An alternative explanation to the acquisition of ymf and the pia 
I259T mutation, given the disparate geographical locations of 
our samples, could be that the Armenian strain (RISE397, 951 
cal BC) containing ymf and the isoleucine residue in pia had a 
longer history in the Middle East and experienced an expansion 
during the 1 st millennium BC. This would have led to its export to 
Eurasia and presumably the extinction of the other more ances- 
tral and less virulent Y. pestis strains. 



Different Region 4 Present in the Ancestral Yersinia 
pestis 

Besides the 55 pathogenicity genes, we also investigated the 
presence of different region 4 (DFR4) that contains several genes 
with potential role in Y. pestis virulence (Radnedge et al., 2002). 
This region was reported as present in the Plague of Justinian 
and Black Death strains, having been lost in the C092 reference 
genome (from the Third Pandemic) (Chain et al., 2004; Wagner 
et al., 2014). Consistent with the ancestral position of our sam- 
ples, we find evidence that the region is present in all of our seven 
samples (Figure S6). 

Yersinia pestis flagellar Frameshift Mutation Absent In 
Bronze Age Strains 

Another important feature of Y. pestis is the ability to evade the 
mammalian immune system. Flagellin is a potent initiator of the 
mammalian innate immune system (Hayashi et al., 2001). 
Y. pseudotubercuiosis is known to downregulate expression 
of flagellar systems in a temperature-dependent manner, and 
none of the known Y. pestis strains express flagellin due to a 
frameshift mutation in the fihD regulatory gene (Minnich and 
Rohde, 2007). However, we do not find this mutation in either 
RISE505 or RISE509, suggesting that they have fully functional 
fihD genes and that the loss of function occurred after 2746 cal 
BC. Interestingly, the youngest of these two Y. pestis genomes 
(RISE505, 1686 cal BC) shows partial loss of one of the two 
flagella systems (758-806 kb), with 39 of 49 genes deleted (Fig- 
ure 2A, arrow 5, Table S8). This deletion was not found in any of 
the other Y. pestis samples (n = 147). This may point to selective 
pressure on ancestral Y. pestis when emerging as a mammalian 
pathogen, yielding variably adaptive strains. 

DISCUSSION 

Our calibrated molecular clock pushes the divergence dates for 
the early branching of Y. pestis back to 5,783 years ago, an addi- 
tional 2,000 years compared to previous findings (Table S5, Fig- 
ure S5) (Cui et al., 2013; Morelli et al., 2010). Furthermore, using 
the temporally stamped ancient DMA data, we are able to derive 
a time series for the molecular acquisition of the pathogenicity 
elements and immune avoidance systems that facilitated the 
evolution from a less virulent bacteria with zoonotic potential, 
such as Y. pseudotubercuiosis, to one of the most deadly bacte- 
ria ever encountered by humans (Figure 6). 

From our findings, we conclude that the ancestor of extant 
Y. pestis strains was present by the end of the 4*^ millennium 
BC and was widely spread across Eurasia from at least the early 
3“'^ millennium BC. The occurrence of plague in the Bronze 
Age Eurasian individuals we sampled (7 of 101) indicates that 
plague infections were common at least 3,000 years earlier 
than recorded historically. However, based on the absence of 
crucial virulence genes, unlike the later Y. pestis strains that 
were responsible for the first to third pandemics, these ancient 
ancestral Y. pestis strains likely did not have the ability to cause 
bubonic plague, only pneumonic and septicemic plague. These 
early plagues may have been responsible for the suggested 
population declines in the late 4^^ millennium BC and the early 
3“'*^ millennium BC (Hinz et al., 2012; Shennan et al., 2013). 
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It has recently been demonstrated by ancient genomics 
that the Bronze Age in Europe and Asia was characterized 
by large-scale population movements, admixture, and re- 
placements (Allentoft et al., 2015; Haak et al., 2015), which 
accompanied profound and archaeologically well-described 
social and economic changes (Anthony, 2007; Kristiansen 
and Larsson, 2005). In light of our findings, it is plausible 
that plague outbreaks could have facilitated— or have been 
facilitated by— these highly dynamic demographic events. 
However, our data suggest that Y. pestis did not fully adapt 
as a flea-borne mammalian pathogen until the beginning of 
the 1®^ millennium BC, which precipitated the historically re- 
corded plagues. 

EXPERIMENTAL PROCEDURES 
Samples and Archaeological Sites 

We initially re-analyzed the data from Allentoft et al. (Allentoft et al., 2015) and 
identified Y. pestis DNA sequences in 7 of the 101 individuals. Descriptions of 
the archaeological sites are given in Supplemental Experimental Procedures 
and Table SI. 

Generation of Additional Sequence Data 

In order to increase the depth of coverage on the Y. pestis genomes we 
sequenced more on these seven DNA extracts. Library construction was con- 
ducted as in (Allentoft et al., 2015). Briefly, double stranded and blunt-ended 
DNA libraries were prepared using the NEBNext DNA Sample Prep Master 
Mix Set 2 (E6070) and lllumina-specific adapters (Meyer and Kircher, 2010). 
The libraries were “shot-gun” sequenced in two pools on lllumina HiSeq2500 
platforms using 100-bp single-read chemistry. We sequenced 32 lanes gener- 
ating a total of 1 1 .2 billion new DNA sequences for this study. Reads for the 
seven Y. pestis samples are available from ENA: PRJEB1 0885. Individual sam- 
ple accessions numbers are available in Table S2. 

Creation of Database for Identification of Y. pestis Reads 

To identify Y. pestis reads in the Bronze Age dataset (Allentoft et al., 2015) we 
first created a database of all previously sequenced Y. pestis strains (n = 1 40), 
Y. pseudotubercuiosis strains (n = 30), Y. simiiis strains (n = 5), and a selection 
of Y. enterocoiitica strains (n = 4) (Supplemental Experimental Procedures and 
Table S2). The genomes were either downloaded from NCBI or downloaded as 
reads and de novo assembled using SPAdes-3.5.0 (Bankevich et al., 2012) 
with the-careful and-cov-cutoff auto options. 

Identification and Assembly of Y. pestis From Ancient Samples 

Raw reads were trimmed for adaptor sequences using AdapterRemoval- 
1.5.4 (Lindgreen, 2012). Additionally leading and trailing Ns were removed 



as well as bases with quality 2 or less. Hereafter, the trimmed reads 
with a length of at least 30 nt were mapped using bwa mem (local 
alignment) (Li and Durbin, 2009) to the database of Y. pestis, 
Y. pseudotubercuiosis, Y. simiiis, and Y. enterocoiitica mentioned above. 
Reads with a match to any of the sequences in this database were aligned 
separately to three different reference genomes: Yersinia pestis C092 
genome including the associated plasmids pCDI, pMTI, pPCPI (Parkhill 
et al., 2001); Yersinia pseudotubercuiosis IP32953 including the associ- 
ated plasmids (Chain et al., 2004); Yersinia pestis biovar Microtus 91001 
and associated plasmids (Zhou et al., 2004). This alignment was performed 
using bwa aln (Li and Durbin, 2009) with the seed option disabled for 
better sensitivity for ancient data, enforcing global alignment of the 
read to the reference genome. Each sequencing run was merged to library 
level and duplicates removed using Picard-1.124 (http://broadinstitute. 
github.io/picard/), followed by merging to per sample alignment files. 
These files were filtered for a mapping quality of 30 to only retain high 
quality alignments and the base qualities were re-scaled for DNA 
damage using MapDamage 2.0 (Jonsson et al., 2013). We defined 
Y. pestis as present in a sample if the mapped depth of the C092 refer- 
ence sequences were higher or equal to 0.1X and if the reads covered 
at least 10% of the chromosome and each of the plasmids. The assembly 
of Justinian, Black Death, and the modern samples were performed 
similarly and is described in detail in the Supplemental Experimental 
Procedures. 

Coverage, Depth and Mappability Analyses 

We calculated the coverage of the individual sample alignments versus 
the Y. pestis C092 reference genome using Bedtools (Quinlan and Hall, 
2010) and plotted this using Circos (Krzywinski et al., 2009). For the 
chromosome, the coverage was calculated in 1 kbp windows and for the 
plasmids in 100 bp windows. Mappability was calculated using GEM- 
mappability library using a k-mer size of 50, which is similar to the average 
length of the trimmed and mapped Y. pestis reads (average length 
43-65 bp). Statistics of the coverage and depth are given in Tables S3 
and S4. 

DNA Decay Rates 

We investigated the molecular degradation signals obtained from the 
sequencing data. Based on the negative exponential relationship between 
frequency and sequence length, we estimated for each sample the DNA 
damage fraction (k, per bond), the average fragment length (1/ \), the DNA 
decay rate (k, per bond per year), and the molecular half-lives of 100 bp frag- 
ments (Allentoft et al., 2012). We compared these DNA decay estimates for 
Y. pestis to the decay of endogenous human DNA from the host individuals. 
If the plague DNA is authentic and ancient, a correlation is expected between 
the rate of DNA decay in the human host and in Y. pestis, because the 
DNA has been exposed to similar environmental conditions for the same 
amount of time. See Supplemental Experimental Procedures for additional 
information. 



Figure 3. Authenticity of Y. pestis DNA 

(A) DNA damage patterns for RISE505 and RISE509. The frequencies of all possible mismatches observed between the Y. pestis C092 chromosome and the 
reads are reported in gray as a function of distance from 5' (left panel, first 25 nucleotides sequenced) and distance to 3' (right panel, last 25 nucleotides). The 
typical DNA damage mutations C>T (5') and G>A (3') are reported in red and blue, respectively. 

(B) Ancient DNA damage patterns (n = 7) of the reads aligned to the C092 chromosome and the Y. pestis associated plasmids pMTI , pCDI and pPCPI . The 
boxplots show the distribution of C-T damage in the 5' of the reads. The lower and upper hinges of the boxes correspond to the 25th and 75th percentiles, the 
whiskers represent the 1 .5 inter-quartile range (IQR) extending from the hinges, and the dots represent outliers from these. 

(C) DNA fragment length distributions from RISE505 and RISE509 samples representing both the Y. pestis DNA and the DNA of the human host. The declining part 
of the distributions is fitted to an exponential model (red). 

(D) Linear correlation (red) between the decay constant in the DNA of the human host and the associated Y. pestis DNA extracted from the same individual 
(R^ = 0.55, p = 0.055). The decay constant (k) describes the damage fraction (i.e., the fraction of broken bonds on the DNA strand). 

(E) Distribution of edit distance of high quality reads from RISE505 and RISE509 samples mapped to either Y. pestis (dark gray) or Y. pseudotubercuiosis (light 
gray) reference genomes. The reads have a higher affinity to Y. pestis than to Y. pseudotubercuiosis. 

(F) Plots of actual coverage versus expected coverage for the 101 screened samples. Expected coverage was computed taking into account read length dis- 
tributions, mappable fractions of reference sequences, and the deletions in pMTI for some of the samples. Samples assumed to contain Y. pestis are shown in 
blue and RISE392 that is classified as not Y. pestis appears is shown in red. See also Figure SI and S2, Table S3. 
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Figure 4. Phylogenetic Reconstructions 

(A) Maximum Likelihood reconstruction of the 
phylogeny of Y. pseudotuberculosis (blue) and 
Y. pestis (red). The tree is rooted using Y. similis 
(not shown). The full tree including three additional 
Y. pseudotuberculosis strains (0:15 serovar) can be 
seen in Figure S4. Major branching nodes within 
Y. pseudotuberculosis with > 95% bootstrap support 
are indicated with an asterisk and branch lengths are 
given as substitutions per site. 

(B) Maximum Likelihood reconstruction of the 
phylogeny in (A) showing only the Y. pestis clade. The 
clades are collapsed by population according to 
branches and serovars, as given in (Achtman et al., 
1999, 2004; Cui et al., 2013). See Figure S4 for an 
uncollapsed tree and Table S2 for details on pop- 
ulations. Nodes with more than 95% bootstrap 
support are indicated with an asterisk and branch 
lengths are given as substitutions per site. 

(C) BEAST2 maximum clade credibility tree showing 
median divergence dates. Branch lengths are 
given as years before the present (see Divergence 
estimations in Experimental Procedures). Only the 
Y. pseudotuberculosis (blue), the ancient Y. pestis 
samples (magenta) and the most basal branch 
0 strains (black) are shown. For a full tree including all 
Y. pestis see Figure S5. See also Figure S3, S4, and 
S5 and Table S5. 



sifier to classify whether reads were originating 
from Y. pestis, Y. pseudotuberculosis, or Y. similis. 
See Supplemental Experimental Procedures and 
Table S3. 

Expected versus Actual Coverage 

We estimated the expected coverage of Y. pestis 
given a specific sequencing depth and correlated 
that with the actual coverage of a genome per sam- 
ple. Expected coverage was calculated as 
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where the reads have N different lengths, li to In with 
counts ri to rN. To account for mappability we deter- 
mined the mappable fraction for each reference 
sequence using kmers of length 40, 50, and 60, 
and then used the mappability value with the k-mer 
length closest to the actual average read length for 
each sample/reference combination. For more infor- 
mation see Supplemental Experimental Procedures. 



o 




Comparison of Samples to Y. pestis and Y. pseudotuberculosis 
Reference Genomes 

We used the alignments of several sets of reads {Y. pestis, 
Y. pseudotuberculosis, and Y. similis) to Y. pestis C092 and the 
Y. pseudotuberculosis IP32953 genomes. Per sample we determined the dis- 
tribution of edit-distances (mismatches) of the reads versus the particular 
reference genome. We used these distributions to build a Naive Bayesian clas- 



Genotyping For Phylogenetic Analyses 

Alignments of all strains versus Y. pseudotuberculosis 
IP32953 was used as reference for genotyping the 
consensus sequences for all samples used in the 
phylogeny. The samples were genotyped individually 
using samtools-0.1 .18 and bcftools-0.1 .17 (Li et al., 
2009) and hereafter filtered (Supplemental Experimental Procedures). Based 
on Y. pseudotuberculosis IP32953 gene annotations, the consensus se- 
quences for each gene and sample were extracted. Because of the divergence 
between Y. pestis and Y. pseudotuberculosis, a number of gene sequences 
displayed high rates of missing bases and we removed genes where 20 or 
more modern Y. pestis samples had >10% missingness. This corresponded 
to a total of 985 genes, leaving data from 3,141 genes that were merged into 
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Figure 5. Identification of Virulence Genes 

(A) Gene coverage heatmap of 55 virulence genes (rows) 
in 140 Y. pestis strains (columns). Sample ordering is 
based on hierarchical clustering (not shown) of the gene 
coverage distributions. RISE505 and RISE509 are 
marked with a red asterisk. Coloring goes from 0% gene 
coverage (white) to 100% gene coverage (blue). 

(B) Depth of coverage of high quality reads mapping 
across pMTI. Outer ring is mappability (gray), genes 
(RNA: black, transposon: purple, positive strand: blue, 
negative strand: red) and then the RISE samples ordered 
after direct AMS dating. Sample ordering are RISE509, 
RISE511, RISEOO, RISE386, RISE139, RISE505 and 
RISE397. See also Figure S6, Tables S2, S6, and S7. 
AMS: Accelerator Mass Spectrometry. 




a supermatrix. We created two different supermatrices, one with Y. similis, 
Y. pseudotuberculosis, and Y. pestis containing 173 taxa x 3,141 genes that 
was used for the initial phylogeny (Figure 4A). The second supermatrix 
consisted of all Y. pestis strains and the genomes from the two closest 
Y. pseudotuberculosis clades, which was used for the divergence time 
estimations. 

Phylogenetics 

The alignments were partitioned by codon position and analyzed with jmodelt- 
est-2.1.7 (Darriba et al., 2012) to test for the best fitting substitution model. All 
decision criteria (Akaike, Bayesian, and Decision theory) found the Generalized 
Time Reversible substitution model with gamma distributed rates, using 
four rate categories, and a proportion of invariable sites (GTR+G+I) to be the 
best fit for each of the three codon partitions. To test for recombination across 
the chromosome we estimated linkage disequilibrium (LD) using 141 Y. pestis 
strains. A total of 482 bi-allelic single nucleotide variations (SNVs), with a minor 
allele frequency of 5% or higher were extracted. For all pairs of the extracted 



SNVs, the LD was calculated using PLINK 1 .9 (Chang 
et al. , 201 5) and plotted against the physical distance be- 
tween the pairs. We reconstructed the phylogeny from 
the codon-partitioned supermatrix using RAxML-8.1.15 
(Stamatakis, 201 4) with the GTR+G+I substitution model. 
Bootstraps were performed by generating 100 bootstrap 
replicates and their corresponding parsimony starting 
trees using RAxML. Hereafter, a standard Maximum 
Likelihood inference was run on each bootstrap replicate, 
and the resulting best trees were merged and drawn on 
the best ML tree. Initial phylogenies placed the Y. pestis 
Harbin strain with an unusual long branch inside the 
1 .ORI clade and it was excluded from further analysis. 
Additionally Y. pseudotuberculosis SP93422 (serotype 
0:15), Y. pseudotuberculosis WP-931201 (serotype 
0:15) and Y. pseudotuberculosis Y248 (serotype un- 
known) was in a clade with long branch lengths and 
were therefore also omitted (see Figure S4). 

Heterozygosity Estimates 

We determined heterozygosity by down-sampling the 
Y. pestis barn-files to the same average depth as the corre- 
sponding RISE samples, genotyped each of the samples 
and extracted heterozygote calls with a depth equal to or 
higher than 1 0. All transitions were excluded. See Supple- 
mental Experimental Procedures for detailed information. 

Divergence Estimations 

To date the divergence time for Y. pestis and nodes within 
the Y. pestis clade we performed Bayesian Markov Chain 
Monte Carlo simulations using BEAST-2.3.0 (Bouck- 
aert et al., 2014) and the BEAGLE library v2.1.2 (Ayres 
et al., 2012). We used the codon-partitioned supermatrix that included the 
two closest Y. pseudotuberculosis clades, with unlinked substitution models, 
GTR+G+I with eight gamma rate categories and unlinked clock models. Dates 
were set as years ago with the RISE509, RISE505, Justinian and Black Death 
samples set to 4,761 , 3,701 , 1 ,474, and 667 years ago, respectively. All unknown 
dates were set to 0 years ago. We followed previous work (Cui et al., 201 3; Wag- 
ner et al., 201 4) and applied a lognormal relaxed clock, assuming a constant pop- 
ulation size. We re-rooted the ML tree from RAxML so that the root was placed 
between the two Y. pseudotuberculosis clades (IP32953, 260, IH1 11554) and 
(IP32921, IP32881, IP32463) and used this as the starting tree. Based on 
the ML tree we defined the closets Y. pseudotuberculosis clade (IP32921, 
IP32881 , IP32463) and the Y. pestis clade as a monophyletic group and defined 
a uniform prior with 1 ,000 and 1 00,000 years as minimum and maximum bounds. 
We ran 20 independent parallel BEAST chains sampling every 2,000 states for 
between 52 and 64 million states using a total of 240,000 core hours. The chains 
were combined using LogCombiner discarding the initial 10 million states as 
burn-in. The combined post burn-in data represented 961 million states and 
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Figure 6. Schematic of Y. pestis Evolution 

Representation of Y. pestis phylogeny and impor- 
tant evoiutionary events since divergence from 
Y. pseudotuberculosis. Genetic gains (biue) and 
genetic ioss or ioss of function mutations (red) are 
indicated by arrows. Historicai recorded pandemics 
are indicated in biue text. The caiendric years in- 
dicates the primary outbreak of the Pandemic. Node 
dates are median divergence times from the BEAST 
anaiysis. The events are based on information from 
this study and Sun et ai., 2014. We used the VCFs 
generated from aii Y. pestis sampies (n = 1 42) (Tabie 
S2) to verify on which branches the genetic events 
occurred. The figure is based on current knowiedge 
and is subject to change with addition of new 
sampies. See aiso Figure S5 and Table S5. BA: 
Bronze Age, CHN: China, FSU: Former Soviet Un- 
ion, AFR: Africa, GER: Germany, MON: Mongolia, 
IRN: Iran, ENG: England, flea tran: flea transmission, 
mut.: mutation. 



the effective sample sizes (ESS) for the posterior was 398, fortheTreeHeight238 
and for the MRCA for Y. pseudotuberculosis and Y. pestis 216. All other 
parameters had ESS > 125. We then sampled 1/5 of the trees from each 
chain and combined them for a total of 192,406 trees that were summarized 
using TreeAnnotator producing a maximum clade credibility tree of median 
heights. We additionally ran BEAST2 sampling the priors only (and disregarding 
sequence information) and found the posterior distribution no different than the 
priors used. It suggests that the posterior distributions recovered when consid- 
ering full sequence alignments are driven by the sequence information and are 
not mere by-products of the sampling structure in our dataset (Figure S5). 

Analysis of Virulence Associated Genes 

To assess the potential virulence of the ancient Y. pestis strains, we identified 
55 genes previously reported to be associated with virulence of Y. pestis (Sup- 
plemental Experimental Procedures and Table S6 for details). Based on the 
alignments to Y. pestis C092 reference genome we determined the fraction 
of the each gene sequence that was covered by at least one read for each 
Y. pestis sample. Additionally, because the different region 4 (DFR4) (Radnedge 
et al., 2002) has been associated with virulence, but is not present in the C092 
genome, we used the alignments to Y. pestis microtus 91001 to determine the 
presence of this region (Supplemental Experimental Procedures). We note that 
the absence of KIM pPCPI is due to it being missing from the reference 
genome, but that it has been reported to be present in KIM strains (Hu et al., 
1998). The genotypes were generated as described above and the variant 
call format (VCF) files from these analyses are available at http://www.cbs. 
dtu.dk/suppl/plague/. For detailed information on genotyping of pde2, pde3, 
rscA, pla, and flhD see Supplemental Experimental Procedures. 

Identification of the Missing ymt Region on pMT1 

Most of the regions that were unmapped could be associated with low mapp- 
ability. However, we identified a region from 59-78 kb on pMTI that could not 
be explained by low mappability. From the depth of coverage this region was 
absent in all of our ancient plague genomes, except for RISE397 (Figure 5). We 
tested for the significance of this by comparing the distribution of gene depths 
within and outside of the missing region using the Wilcoxon rank-sum test (Ta- 
ble S7). For all samples except RISE397 the region had a median depth of OX 
and the gene depth distributions were significantly different compared to the 
remaining pMTI plasmid genes (p values < IE-9). For the RISE397 sample, 
the regions had 0.43X and 0.42X median depths and there was no significant 
difference in the depth of the genes in the two regions (p value 0.77). 

ACCESSION NUMBERS 

The accession number for the reads for the seven Y. pestis samples reported in 
this paper is ENA: PRJEB10885. 
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SUMMARY 

LINE-1 retrotransposons are fast-evolving mobile 
genetic entities that play roles in gene regulation, 
pathological conditions, and evolution. Here, we 
show that the primate LINE-1 5'UTR contains a pri- 
mate-specific open reading frame (ORF) in the anti- 
sense orientation that we named ORFO. The gene 
product of this ORF localizes to promyelocytic leuke- 
mia-adjacent nuclear bodies. ORFO is present in 
more than 3,000 loci across human and chimpanzee 
genomes and has a promoter and a conserved 
strong Kozak sequence that supports translation. 
By virtue of containing two splice donor sites, 
ORFO can also form fusion proteins with proximal 
exons. ORFO transcripts are readily detected in 
induced pluripotent stem (iPS) cells from both pri- 
mate species. Capped and polyadenylated ORFO 
mRNAs are present in the cytoplasm, and endoge- 
nous ORFO peptides are identified upon proteomic 
analysis. Finally, ORFO enhances LINE-1 mobility. 
Taken together, these results suggest a role for 
ORFO in retrotransposon-mediated diversity. 

INTRODUCTION 

Transposable elements (TEs) are mobile genetic elements that 
can alter their chromosomal locations in the host genomes. 
TEs, first discovered by Barbara McClintock in maize 
(McClintock, 1 950), are abundantly present in nearly all genomes 
studied to date; they influence gene expression and shape the 
genomes over evolutionary time (Huang et al., 2012). There are 
two classes of TEs based on their transposition mechanisms: 
DNA transposons and retrotransposons. DNA transposons 
mobilize with a cut-and-paste mechanism, whereas retrotrans- 

CrossMark 



posons move by copy-and-paste via an RNA intermediate 
(Kleckner, 1990; Luan et al., 1993). Autonomous elements from 
both classes are defined as TEs that encode the proteins 
required for transposition, whereas non-autonomous elements 
depend on such proteins to be provided in trans. In primate ge- 
nomes, most active TEs belong to the retrotransposon families. 
Of these, LINE-1 (LI) elements are the only autonomous ele- 
ments that are currently active (Dewannieux et al., 2003; Hancks 
et al., 2011) and thus have directly and indirectly contributed to 
~30% of the human genome (Lander et al., 2001). At present, 
the majority of LI elements are inactive, due to accumulated mu- 
tations as well as 5' truncations that are common during the inte- 
gration process, thus reducing the number of estimated active 
elements to ~80 per genome (Brouha et al., 2003). The first 
active LI element was isolated through analysis of mutagenic 
LI insertions into the factor VIII gene in hemophilia A patients 
(Dombroski et al., 1991). Since then, retrotransposon germline 
insertions have been linked to ~100 human diseases (Hancks 
and Kazazian, 2012). 

Intact, active Lis are ~6 kb long and contain a 5'UTR, two 
open reading frames (ORF1 and ORF2) and a short 3'UTR (Scott 
et al., 1987). The LI 5'UTR has promoter activity in both the 
sense and antisense (ASP) directions (Speek, 2001; Swergold, 
1990). ORF1 encodes an ~40 kDa RNA-binding protein that is 
required for LI transposition (Kolosha and Martin, 1997; Moran 
et al., 1996). However, ORF1 does not have any significant 
sequence similarity to known proteins (Goodier et al., 2007). 
ORF2 is a large protein at ~150 kDa with endonuclease and 
reverse transcriptase activities (Mathias et al., 1991). These ac- 
tivities, as well as the function of a cysteine-rich region at the C 
terminus, are important for LI mobility (Feng et al., 1996; Moran 
et al., 1996). 

Regardless of their ability to mobilize, Lis contribute to tran- 
scriptome diversity and gene regulation (Cordaux and Batzer, 
2009). Transcription initiated in both directions can extend 
beyond the LI sequence but, due to the presence of a polyA 
signal at the end of the 3'UTR, most sense transcripts end within 
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the element. However, extensions into the genomic flank are 
also frequently observed and can lead to 3' transductions (Moran 
et al., 1 999). Analyses of cloned cDNAs provide evidence of anti- 
sense transcripts that are spliced into exons in the neighboring 
genomic sequences (Macia et al., 2011; Matlik et al., 2006; 
Wheelan et al., 2005). Recent studies have focused on specific 
examples of spliced transcripts with a focus on disease, and a 
number of LI -driven transcripts have been shown to exist in 
cancer cells (Cruickshanks and Tufarelli, 2009). In addition to 
driving genes, antisense transcripts have been linked to chro- 
matin modifications that influence gene expression (Cruick- 
shanks et al., 2013). 

A recent analysis of Lis in primates showed that, while ORF1 
and ORF2 sequences have been relatively well conserved, 
acquisition of new 5'UTRs frequently occurred during primate 
evolution, providing the diversity that resulted in selection of 
the current 5'UTR (Khan et al., 2006). With the above in mind, 
we set out to improve our understanding of the properties of 
the primate LI 5'UTR. Here, we show that the currently active 
primate LI 5'UTR has well-conserved properties that support 
translation of an ORF that we have named ORFO. ORFO is en- 
coded by a primate-specific antisense ORF that lies downstream 
from the ASP and has a strong, well-conserved Kozak sequence. 
The gene product of this ORF is predominantly nuclear and local- 
izes to promyelocytic leukemia (PML)-adjacent bodies. ORFO 
also has two prominent splice donor (SD) sites at nucleotides 
106 and 191 (amino acids 35 and 64) that can act in concert 
with splice acceptors (SAs) in downstream genomic sequences 
to generate fusion proteins. ORFO mRNAs are capped, polyade- 
nylated, associated with ribosomes, and upon immunoaffinity 
purification, peptides from endogenous ORFO products can be 
detected by mass spectrometry. Lastly, overexpression of 
ORFO leads to a modest but significant increase in LI mobility. 
Thus, we have identified and begun to characterize a third ORF 
from primate LI retrotransposons. 

RESULTS 

Identification of an ORF in the Human Antisense 
LI 5 UTR 

We started by analyzing the antisense 5'UTR for the presence of 
ORFs that have an upstream promoter, start with ATG, and have 
a strong Kozak sequence determined by the presence of A/G in 
position -3 and G at position +4 (Kozak, 1987). Only one poten- 
tial ORF exists that meets these criteria and, due to its 5' position 
with respect to ORF1 and ORF2, we have called it ORFO. ORFO 
lies between nucleotides 452-236 from the 5' end of LINE-1 in 
the antisense orientation and contains two SD sites (red boxes) 
within the potential coding sequence (Figure 1A). There are 
~781 loci that could encode full-length (FL) ORFO in the human 
genome; the consensus sequence for the FL ORFO protein ob- 
tained from these loci is shown in Figure 1A. The chimp ORFO 
consensus sequence from ~395 FL ORFO loci is identical to 
that of the human. 

The previously mapped LI ASP lies upstream of ORFO, with 
some overlap (Speek, 2001). This overlap prompted us to check 
whether the promoter activity resided upstream of the initiator 
methionine (1®* Met) of ORFO. Results from luciferase reporter 



assays suggested that promoter activity was upstream but not 
downstream of ORFO 1 Met, and we further mapped a minimal 
ORFO promoter of ~150 bp that had similar activity to the previ- 
ously described LI ASP (Figure 1 B). We also cloned a number of 
polymorphic ORFO promoters upstream of luciferase and GFP 
reporters. While variable, all the tested promoters were active 
(data not shown). This finding is consistent with previous obser- 
vations that a high percentage of LI 5'UTRs have antisense 
promoter activity (Macia et al., 2011). Next, in vitro translation 
of HA-tagged ORFO was tested in rabbit reticulocyte lysates 
and confirmed with western blot analysis (Figure 10). 

To investigate whether this potential ORF could be translated 
in human cells, we removed the stop codon of ORFO and cloned 
it upstream of a promoterless, in-frame GFP coding sequence 
that lacked the first ATG. Upon transfection, western blot anal- 
ysis showed that, indeed, the ORFO promoter and the context 
around the 1®^ Met of ORFO were sufficient to translate the 
ORFO-GFP fusion protein (Figure ID). 

ORFO Protein Is Predominantly Nuclear and Present in 
PM L- Adjacent Foci 

To analyze the subcellular localization of ORFO, we generated a 
GFP-tagged ORFO clone in an LI context (GFP-ORF0-L1). Since 
two SD sequences that were often involved in generation of 
spliced antisense transcripts (Speek, 2001) fell within ORFO, to 
allow detection of both spliced and unspliced products, GFP 
was placed at the N terminus but downstream of the Kozak 
context of ORFO to minimize any effects on translation initiation 
(Figure IE). Western blot analysis confirmed that GFP-ORFO 
fusion protein was generated (Figure IE). Importantly, when 
the 1®^ Met of ORFO was mutated to threonine (MIT), we 
observed that GFP signal was lost, showing that translation 
started from the 1®^ Met of ORFO (Figure 1 F) and ruling out any 
potential upstream translation initiation. Furthermore, addition 
of a poly A signal downstream of ORFO at the end of the LI did 
not change protein localization, suggesting that the produced 
ORF was contained within the LI and was not a splicing product 
with the downstream flank (Figure SI A). We also fused ORFO to 
mOherry (29% identity to EGFP) and observed a very similar 
pattern, suggesting that the sequence of the tag was not driving 
the localization (Figures SIB and S10). Interestingly, ORFO, but 
not GFP-alone from the same plasmid backbone, was localized 
predominantly in nuclear foci in the majority of cells (Figures 1 F 
and SI D-S1 F). As predicted by the charge distribution of amino 
acid residues, the 0 terminus portion of ORFO was required for 
nuclear localization (Figure S1G). Since a number of ORFO vari- 
ants may be encoded due to polymorphisms in LI sequences, 
we cloned some of these variants and observed that, unless 
truncated, most localized similarly (data not shown). 

Based on the numbers and distribution of foci, we hypothe- 
sized that ORFO localization could be related to PML bodies. 
PML bodies are nuclear proteinaceous structures often associ- 
ated with the nuclear matrix and are involved in a wide variety 
of processes that may influence LI biology: stress, anti-viral 
and DNA damage response, transcriptional regulation, hetero- 
chromatin, and post-translational protein modifications (Ber- 
nardi and Pandolfi, 2007). Indeed, in cells transfected with 
PML-IV-GFP and mCherry-ORFO, high-magnification imaging 
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Figure 1. Identification of ORFO in L1 5'UTR 

(A) Location of ORFO in L1 . The start codon ATG 
and the stop codon TGA are iabeied red in the 
antisense orientation. The positions of spiice 
donor sites within the coding sequence are indi- 
cated with red squares. Consensus protein 
sequence of fuii-iength ORFO based on -^781 
potentiai ORFO ioci in the human genome. 

(B) Upstream -^150 bp region of ORFO has pro- 
moter activity. Luciferase assays were performed 
to determine promoter activity of the L1 5'UTR 
regions shown in the panei beiow the graph. Red 
and orange iines represent antisense and sense 
strands, respectiveiy. DSMet refers to down- 
stream of initiator methionine. Data are presented 
as mean ± SEM. "Denotes p < 0.05 significance 
between indicated groups using t test. CTRL 
denotes controi. 

(0) ORFO can be transiated in vitro. HA-tagged 
ORFO production was monitored by western 
biotting. 

(D) Production of ORFO-GFP fusion protein was 
detected by GFP western biot. The C-terminai 
GFP tagged ORFO construct driven by the up- 
stream region of ORFO is shown at the bottom. 
Biack arrows indicate GFP aione and the fusion 
protein. Red arrow highiights the size shift. 

(E) GFP-ORFO fusion protein was detected by 
western biot. Design of GFP-ORFO construct in L1 
context. GFP is cioned at the N terminus of ORFO 
downstream of the 1®^ Met and potentiai Kozak 
context. Red arrow highiights the size shift in the 
generated protein. 

(F) Transiation of GFP-ORFO is dependent on the 
ORFO initiator methionine. Fiuorescent detection 
of ORFO iocaiization upon transfection of the 
construct depicted in E into HEK293T ceiis. WT, 
wiid-type; M1T, initiator methionine to threonine 
mutant. Scaie bar, 10 i^m. 

(G) Most of ORFO protein iocaiizes to PML-adja- 
cent nuciear bodies. Confocai imaging of ceiis 
transfected with mCherry-ORFO- and GFP-PML- 
iV-encoding piasmids. Scaie bar, 4 ^im. 

(H) Spot representation of ORFO (red) and PML 
(green) foci, images from 90° and 180° reiative to 
Movie SI are shown. Scaie bar, 1 ^im. 

See aiso Figure SI . 



showed that ORFO was present in PML-adjacent foci (Figure 1 G). 
Spot analysis of confocai z series confirmed this observation 
(Figure 1H; Movie S1). 

A Large Number of ORFO Loci with a Conserved 
Functional Kozak Context Exist in Primate Genomes 

We sought to determine how many loci could potentially encode 
ORFO in the human and chimp genomes. Taking splicing into 
consideration, we scanned these genomes for potential ORFO 
loci that are untruncated up to the two commonly used SD sites 
and have an adjacent GT dinucleotide. Fluman and chimp ge- 
nomes have ~3,528 and ~3,299 such loci (of which ~974 and 
~745 are species-specific, respectively) that have the potential 
to splice into the genomic flanks and generate fusion proteins 
(Figures 2A and 2B). All FL ORFO loci contain at least one SD 
and, as a result, they are present in this set. L1 family classifica- 



tion of ORFO loci are shown in Table S1 . Considering insertional 
polymorphisms within populations and somatic insertions, the 
number of ORFO loci may be even larger. Analysis of human 
and chimp genomes for ORFO loci revealed a conserved strong 
Kozak context around the first ATG (Figure 20). To test the func- 
tionality of the consensus wild-type ORFO Kozak (WT ORFO), 
we mutated it to an optimal Kozak sequence (OPT) as well as a 
-3/-I-4 mutant (MT ORFO). Expression of GFP-ORFO was com- 
parable between WT ORFO and OPT, whereas the -3/-I-4 muta- 
tion abolished translational activity (Figures 2D and S2A). 

We also extended our ORFO analysis across mammalian ge- 
nomes and found ORFO loci with homology throughout the po- 
tential coding sequence, only in the genomes of Catarrhini. 
Within this parvorder of primates. Old World monkey and ape ge- 
nomes contain on average ~50 and ~2,500 such ORFO loci, 
respectively. Consensus Kozak sequences derived from these 
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loci suggest that the ORFO Kozak context is conserved, 
including the G-3 and G+4 positions (red boxes) (Figure S2B). 
In New World monkeys, a very small number of ORFO loci with 
limited N terminus homology were observed; however, due to 
the low number, a reliable consensus could not be built and 
thus these genomes were excluded from further investigation. 

We next focused on the ORFO coding sequences to get a bet- 
ter picture of evolutionary conservation of ORFO within human L1 
families and across primates. The alignments of ORFO proteins 
from consensus L1PA1-8 sequences (Khan et al., 2006) are 
shown in Figure S2C. L1PA1 (that includes L1HS) and L1PA2 
have intact SD1 and SD2. L1PA3-L1PA6 families contain a 
longer ORFO due to a frameshift after SD2. In L1PA5 and 
L1 PA6, SD1 is mutated but SD2 is conserved (data not shown). 
L1 PA7 and L1 PAS have 0 termini that are distinct from the other 
L1 PA families and lack SD1 and SD2. The abovementioned vari- 
ation across L1PA families was recapitulated in the maximum 



Figure 2. More than 3,000 Potential ORFO 
Loci with a Conserved and Functional Ko- 
zak Sequence Exist in the Human and 
Chimp Genomes 

(A and B) Chromosomal locations of ORFO loci 
in the human and chimp reference genomes. The 
human and chimp genomes have -^3,528 and 
-^3,299 loci, respectively, that have the potential to 
splice into the genomic flanks and generate fusion 
proteins. 

(C) ORFO loci have a conserved strong Kozak 
context. Logo of Kozak sequences of ORFO loci in 
human and chimp genomes. Start codon is un- 
derlined with red, and important nucleotides for 
translation initiation are underlined with black. 

(D) The ORFO Kozak sequence is functional. 
Western blot analysis of ORFO-GFP fusions driven 
by optimal (OPT), wild-type (WT ORFO), and 
mutant (MT ORFO) Kozak sequences from the 
GFP-ORF0-L1 construct. Arrow highlights the 
GFP-ORFO protein. 

(E) Basic phylogenetic analysis of ORFO se- 
quences in human L1PA families. ORFO coding 
sequences were extracted from L1PA family 
consensus sequences and used in generating the 
maximum likelihood tree. 

(F) Alignment of consensus ORFO sequences 
derived from Catarrhini species. Charged residues 
are labeled in red and blue for positively 
and negatively charged, respectively. These 
consensus sequences were used in building 
the maximum likelihood tree for these primate 
species. 

See also Figure S2 and Table S1 . 



likelihood tree (Figure 2E). Next, we 
generated consensus ORFO sequences 
from the Catarrhines for comparison (Fig- 
ure 2F). These primates have very similar 
consensus ORFO proteins, except for the 
region between residues ~42 and 50. 
While all species’ consensus ORFO 
sequence contains SD2, rhesus and ba- 
boons lack SD1 due to a point mutation (Figure S2D). The 
maximum likelihood tree from the ORFO sequences of Catarrhine 
genomes is shown in Figure 2F. 

Capped and Polyadenylated ORFO mRNAs Are Present 
in the Cytoplasm 

One would expect ORFO to be tightly regulated as a transpos- 
able element protein. In addition, short ORFs are technically 
challenging to uncover (Andrews and Rothnagel, 2014). To 
determine whether transcription from ORFO loci could be de- 
tected, we turned to transcriptomic data. Cap analysis of gene 
expression (CAGE) data allow the mapping of transcription start 
sites (TSSs) and thus make it possible to identify the 5' end of 
transcripts that originate from LI (Faulkner et al., 2009; Shiraki 
et al., 2003). Our analysis of CAGE data showed that the majority 
of TSSs for antisense RNAs are upstream of ORFO 1 Met, sug- 
gesting that most antisense transcripts could have the capacity 
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Figure 3. ORFO-Gene Fusion Transcripts 
Are Expressed in Human and Chimp iPS 
Cells 

(A) Most of the antisense LI transcription starts 
upstream of ORFO. Cytopiasmic poiyA pius K562 
CAGE (ENCODE/RiKEN) reads were mapped to 
LI HS consensus sequence. 

(B) Protein iogo of ORFO ioci that are untruncated 
untii spiice donor sites in the human and chimp 
genomes. Sequences from SD1 and SD2 ioci are 
represented as protein sequence iogos. Positions 
of SD1 and SD2 are indicated with biack boxes. 
(0 and D) Tabie of top 25 protein-coding genes, for 
which RNA-Seq reads were detected at the spiice 
junction with ORFO in human and chimp iPS ceiis. 
Red-iabeied genes have ORFO fusions due to 
species-specific LI insertions. Biue iabeiing rep- 
resents genes for which ORFO fusion transcripts 
were detected in both human and chimp iPS 
sampies. Transcripts of biack-iabeied gene fu- 
sions are detected oniy in one species. The ratios 
of ORFO isoforms with respect to the totai (i.e., 
ORFO + annotated gene isoforms) are shown in the 
ratio coiumn. Tabie was sorted for ratio from high 
to iow. 

(E) Ribosome footprinting data from HEK293T 
ceiis were mapped to the L1HS consensus 
sequence. 

See aiso Figure S3 and Tabie S2. 



to encode ORFO (Figure S3A). More importantly, ORFO mRNA 
could be detected not only in whole cell but also in the cyto- 
plasmic fraction; capped and polyadenylated ORFO mRNAs 
were present in the cytoplasm (Figure 3A). 

Most intronic L1s are in the reverse orientation with respect to 
their host genes (Smit, 1999), including Lis with intact ORFO: 
~650 protein coding genes in human and ~450 in chimp contain 
ORFO loci in the same direction as host gene transcription (data 
not shown), raising the possibility of a number of ORFO-host 
gene fusion events. The sequence logos of ORFO loci in human 
and chimp that have the potential to splice, along with commonly 
used SD sites, are shown in Figure 3B. 

To identify ORFO fusion transcripts in human and chimp, 
we turned to RNA sequencing (RNA-seq) data that we had 
generated from iPS cells (Marchetto et al., 2013). Indeed, 



RNAs for a number of fusion events 
were observed in both species (Figures 
30, 3D, and S3B). Analysis of the contri- 
bution of ORFO isoforms (iso-0) to the 
expression of these genes suggested 
that some genes were primarily driven 
by ORFO in iPS cells, whereas for other 
genes, iso-0 contribution ranged from 
moderate to minor (Figures 30 and 3D; 
Table S2). We also extended our analysis 
to fibroblast-iPS pairs and observed that 
ORFO transcript levels were dramatically 
upregulated in both human and chimp 
pluripotent stem cells compared to 
respective source fibroblasts (Figures S3C and S3D and data 
not shown). 

ORFO mRNAs Are Associated with Ribosomes 

The presence of capped ORFO mRNAs with a polyA tail in the 
cytoplasm as well as fusion transcripts with proximal exons of 
protein coding genes prompted us to investigate, by analyzing 
ribosome footprinting data, whether ORFO RNAs were associ- 
ated with ribosomes (ribosome footprinting [Ribo-seq]) (Ingolia 
et al., 2011). First, we mapped Ribo-seq reads obtained from 
HEK293T cell line (Shalgi et al., 2013) to L1HS consensus 
sequence (Figure 3E). In the sense orientation, a plateau of ribo- 
some footprints was detected for ORF1 but ORF2 signal was 
much weaker, a finding that is in accordance with the known 
translation levels of ORF1 and ORF2 proteins (Alisch et al.. 
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2006). In the antisense orientation, a strong signal was evident 
for ORFO (Figure 3E). Interestingly, this signal also extended 
beyond the FL ORFO sequence, which may be due to within- 
L1 splicing events (see below and data not shown) or L1s from 
older families, in which the encoded consensus ORFO extends 
until the end of L1 (see Figure S2C). Even though reads obtained 
by ribosome footprinting were shorter than those gained from 
RNA-seq, we observed spliced ORFO footprints of in-frame fu- 
sions to SCAMP1, SLC44A5, GJB4, HTR2C, and RABGAP1L 
(driven by a human-specific L1 insertion). Thus, the influence 
of ORFO may not necessarily be limited to L1 biology. 

ORFO-Downstream Exon Fusion Protein Is Expressed 

To test whether ORFO could be transcribed and translated from 
an intronic position, we cloned the GFP-ORFO-L1 cassette in the 



Figure 4. ORFO Protein: Intronic Expres- 
sion, Endogenous Detection, and Effect on 
L1 Mobility 

(A) GFP-ORFO can be expressed from an intronic 
position in an ORFO initiator Met-dependent 
manner. The GFP-ORFO-L1 cassette (wiid-type or 
M1T) was cioned in the antisense orientation in an 
intron. GFP was detected by confocai micro- 
scopy. Scaie bar, 10 i^m. 

(B) Western biot anaiysis of GFP-ORFO expression 
suggests that intronic ORFO protein is produced 
but is not fuii-iength. The fusion protein expressed 
from the intronic construct is indicated with the 
biack arrow. 

(C) Functionaiity of ORFO antibody was tested 
using overexpressed protein, by immunoprecipi- 
tation, and subsequent western biotting. 

(D) Schematic description and sequences of 
identified ORFO peptides. The first peptide (biack 
square) resides upstream of SD2. The second 
peptide (red square) spans the spiice junction of 
proteins formed through spiicing between SD2 
and SA1. The third peptide (green square) is 
iocated downstream of SA1 within the LI 
sequence. 

(E-G) Spectra of peptides (#1 , #2, and #3) identi- 
fied by proteomic searches. Green peaks in (G) 
represent neutrai iosses. 

(H and i) Overexpression of ORFO protein, but not 
ORFO RNA, increases LI mobiiity based on iucif- 
erase LI reporter in HEK293T ceiis and human 
NPCs. Potentiai antisense RNA effects were 
controiied for by using a singie-nucieotide mutant 
ORFO that repiaces the initiator Methionine with 
Threonine. Data are presented as mean ± SEM. 
*Denotes p < 0.05 significance between indicated 
groups using t test. 

See aiso Figure S4. 



antisense orientation within a natural 
human intron. Upon transfection of this 
construct into cells, GFP-ORFO was ex- 
pressed. Moreover, translation started 
at the ORFO 1®^ Met, as the M1T muta- 
tion abolished expression (Figure 4A). 
Interestingly, GFP signal was localized 
throughout the cell instead of in nuclear foci. This difference in 
localization was explained by western blot analysis, which 
showed that intronic GFP-ORFO fusion protein was different 
from GFP alone or GFP-FL ORFO, suggesting that a spliced 
product was translated (Figure 4B). Generation of a fusion pro- 
tein via splicing between SD1 of ORFO and the downstream 
exon was confirmed by sequencing (data not shown). 

Proteomic Detection of Endogenous ORFO Peptides 

Having observed ORFO transcripts as well as expression from re- 
porter plasmids, we investigated endogenous ORFO products. 
Proteomic identification of ORFO requires detection of peptides 
within unspliced ORFO or N terminus ORFO fragments of fusion 
proteins. Therefore, due to the small size of ORFO, a limited num- 
ber of possible peptides are available for detection by mass 
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spectrometry. In addition, the distribution of the target residues 
(K and R) for trypsin, the most commonly used enzyme for pro- 
teomics, leads to the generation of non-ideal peptide fragment 
sizes (see Figure 2F): the N terminus is poor in these residues 
whereas the C terminus is rich, generating a very small number 
of peptides optimal for mass spectrometry. In fact, only one pep- 
tide from the main body of ORFO could be detected in our mass 
spectrometry analysis of overexpressed ORFO (Figure S4A). 

Nevertheless, we proceeded to attempt detection of endoge- 
nous ORFO peptides. We started by raising polyclonal anti-ORFO 
antibodies targeting the consensus L1HS FL-ORFO protein. 
Upon confirmation that the ORFO antibody worked for immu- 
noaffinity enrichment from overexpressed HA-ORFO extracts 
(Figure 40), we turned to the cultured cell type that expressed 
the highest levels of ORFO transcripts as a class: human plurip- 
otent stem cells. In parallel, we computationally generated 
an RNA expression-based ORFO proteomics database that 
included potential unspliced and spliced ORFO proteins. The 
combined ORFO-Human Uniprot database was used in spectra 
searches. Next, immunoprecipitates from control and ORFO 
antibody were subjected to mass spectrometry analysis. Liquid 
chromatography-tandem mass spectrometry (LC-MS/MS) 
spectra searches did not find any ORFO fragments in control 
antibody samples. However, searches of anti-ORFO immunopre- 
cipitates led to identification of endogenous ORFO peptides (Fig- 
ures 4D-4G). Spectra obtained from overexpressed peptides 
for comparison and further information on all the spectra are pre- 
sented in Figures S4A-S4D. The first peptide (black square) re- 
sides upstream of SD2. The second peptide (red square) spans 
the splice junction of proteins formed through splicing between 
SD2 and SA1 (SA1 : based on RNA-seq analysis, a functional 
splice acceptor site 336 nucleotides downstream of the ORFO 
start site in the L1 5'UTR antisense). The third peptide (green 
square) is located downstream of SA1 within the LINE-1 
sequence (Figure 4D). There are multiple loci that can encode 
the observed ORFO peptides and the exact identities of source 
loci are currently unknown. 

ORFO Enhances LI Mobility 

Given the fact that the ORFO coding sequence resides in the LI 
5'UTR with bidirectional promoter activity, the most parsimo- 
nious function for ORFO would be a potential effect on LI 
mobility. Human Lis driven by CMV or CAG promoters are mo- 
bile (Moran et al., 1996); thus it is clear that ORFO is not essential 
for LI activity. We attempted to test potential in c/s effects of 
ORFO mutations; however, this task was hampered by the fact 
that the ORFO sequence overlaps with the forward LI promoter 
(data not shown). Thus, we overexpressed ORFO in trans and 
tested for its effect on LI mobility. To prevent any direct anti- 
sense LI RNA effect due to transcription of ORFO, we used a 
CAG promoter-driven LI reporter. In HEK293T cells, ORFO 
expression led to a ~41 % increase in LI mobility (Figure 4H). 
To rule out any indirect effects of expressing antisense LI 
RNA, we also used the single nucleotide mutant control, ORFO 
MIT, that did not produce ORFO protein. This construct had no 
effect on LI mobility, strongly suggesting that ORFO protein 
was responsible for the observed increase (Figure 4H). Impor- 
tantly, wild-type, but not MIT mutant, ORFO also increased LI 



mobility in human embryonic stem (ES) cell-derived neural pro- 
genitors (human NPC) by ~38% (Figure 41), bringing forth the 
possibility that ORFO may contribute to somatic variation by 
enhancing LI activity in pluripotent cells. 

DISCUSSION 

The constant competition between transposable elements and 
host-protective mechanisms contributes to genome evolution 
(Daugherty and Malik, 2012; Slotkin and Martienssen, 2007). It 
is currently unclear whether LI antisense promoter activity has 
been a major factor in this arms race. From an LI perspective, 
antisense transcription can positively influence sense expres- 
sion through recruitment of transcriptional machinery, inducing 
open chromatin structure or via formation of a non-coding 
RNA. On the other hand, expression of antisense RNA can 
lead to dsRNA formation, which may trigger an RNAi response 
(Matlik et al., 2006; Yang and Kazazian, 2006). Our results sug- 
gest that, in addition to the aforementioned roles, LI 5'UTR 
has the ability to initiate translation in the antisense direction. 

ORFO is present in more than ~2,500 loci in the ape genomes, 
whereas this number is much smaller in the Old World monkeys. 
While some of this difference may be related to variable genome 
sequence quality, we expect this difference to mostly represent 
LI biology. The alignment of ORFO sequences from human 
LI PA1-8 suggests that the main difference between these fam- 
ilies is the 0 terminus of ORFO. We have also noticed that the se- 
quences around the ORFO translation start site influence forward 
promoter activity. It is possible that the translation activity in the 
antisense LI 5'UTR is coupled with the forward promoter activ- 
ity, and thus the N terminus is more conserved with respect to 
the rest of the ORFO sequence due to evolutionary pressure. If 
that indeed is the case, translation activity in rhesus and baboon 
may generate distant relatives of the ape ORFO. Consistent with 
this hypothesis, searches for ORFO in New World monkey ge- 
nomes reveal a very small number of loci that have homology 
to human ORFO, with similarity only at the N terminus. Consid- 
ering the fact that LI retrotransposons recruit new 5'UTRs over 
time, it is conceivable that distant primates such as marmosets 
and squirrel monkeys may have significantly different 5'UTRs. 
Improved primate genome sequence quality and future experi- 
mentation will allow the testing of these possibilities. 

Expression of ORFO, but not an untranslated point mutant 
version, enhances LI activity from LI luciferase mobility reporter 
in human cells, suggesting a role for ORFO protein in LI activity. 
We currently do not know the mechanism of this effect. Similar 
to ORF1 , ORFO does not share any extensive homology with 
known genes, so it is not possible to propose a domain-based 
prediction. However, ORFO is a highly positively charged protein 
that may act by binding to nucleic acids. The PML proximity to 
ORFO is intriguing, especially given that a large number of pro- 
teins are recruited to PML bodies depending on the cellular state, 
with stress playing a prominent role in determining the content as 
well as the morphology of PML bodies. Interestingly, PML is 
involved in antiviral responses and protects cells from viral infec- 
tions. Some viral proteins target the integrity of PML bodies and a 
large number of components are transcriptionally regulated by 
the interferon pathway (Everett and Chelbi-Alix, 2007). Whether 
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localization adjacent to PMLs is reflective of ORFO function or the 
cell’s response remains to be seen. It is possible that ORFO, anal- 
ogous to some viral proteins, may interfere with the functions of 
PML and enhance mobility. Further studies will be required to 
gain insight into the mechanism of action of ORFO. 

The influence of ORFO may not necessarily be limited to L1 
biology. Our transcriptomic analysis suggests that exons of 
host genes provide splice acceptor sites for intronic or proximal 
ORFO loci. Overall, ORFO expression levels correlate with the 
pluripotency of the cell types and ORFO-proximal exon fusion 
products are detected by proteomics. While any effects of 
ORFO expression on the host or proximal gene would be context 
and sequence dependent, one could make certain predictions. If 
the downstream exon is in frame with respect to ORFO upon 
splicing, the N terminus of the host protein would be replaced 
by an ORFO variant, which could alter the localization and/or 
function. Out-of-frame ORFO fusions would contain amino acids 
from an alternative frame of the gene and most would encounter 
a stop codon. Such transcripts, depending on the context, might 
be expressed or be subject to nonsense-mediated decay (NMD). 
By virtue of high copy numbers and sequence variants, one 
would expect to see varying degrees of NMD response. In addi- 
tion, cell-state transitions, stress, and crosstalk with the RNAi 
pathway might provide opportunities for NMD targets to be 
translated (Kervestin and Jacobson, 2012). In cases of fusions 
of ORFO located upstream of coding sequences, ORFO might 
act as an upstream ORF (uORF). Since uORF function is affected 
by the length and sequence of the uORF as well as by the dis- 
tance between the upstream and the main ORF, variations in 
ORFO sequences could result in differential translation regulation 
(Andrews and Rothnagel, 2014). 

Lis, as the sole autonomously active retrotransposons in pri- 
mate genomes, continue to shape our genomes. Our data sug- 
gest that, in addition to their previously ascribed roles in gene 
regulation (Huang et al., 2012), Lis contain a third ORF and 
have the ability to generate insertion site-dependent ORFs via 
splicing. Considering the fact that transcription and translation 
start within LI elements, these ORFO variants could be co-regu- 
lated. Analogous to the other LI proteins, disorders such as neo- 
plasms (Rodic et al., 2014) may provide opportunities for higher 
ORFO expression, which in turn could contribute to the patholog- 
ical phenotypes. It is tempting to speculate that, over evolu- 
tionary time, the propensity of ORFO to splice into proximal 
exons may have led to not only gene regulatory changes but 
also the emergence of new proteins. The extent to which ORFO 
variants contribute to diversity, both in evolutionary terms and 
disease conditions, remains to be investigated. 

EXPERIMENTAL PROCEDURES 
Cloning and Mutagenesis 

Primers from IDT and Phusion High Fidelity Polymerase (NEB) were used for 
PCRs. pGL4.10 (Promega) was the plasmid backbone used for promoter lucif- 
erase assays. To test the effect of ORFO expression on L1 mobility, ORFO pro- 
moter, coding sequence, and the downstream sequence (until the end of L1 in 
the antisense orientation) was cloned into pEF-BOS-EX (Mizushima and Na- 
gata, 1990). To include any potential within-LI splicing products and prevent 
contribution from the plasmid backbone, a fragment containing stop codons 
in all three frames as well as a polyA signal was included immediately down- 



stream of the insert. ORFO-GFP construct was cloned into a modified (SV40 
promoter and luciferase removed) pSICheck2 vector (Promega). A modified 
(luciferase cassette removed) pYX014 plasmid (Xie et al., 2011) was used for 
GFP-ORFO and mCherry-ORFO cloning: nucleotide 13 of ORFO was mutated 
(C ^ G) to generate an AscI site that was used for subsequent cloning of 
GFP and mCherry. HA-tagged ORFO was cloned into pCDNAS.I for in vitro 
translation. GFP-ORF0-L1 cassette was cloned into pEF-BOS-EX with Bglll 
for intronic expression. Mutagenesis was carried out using the Quick Change 
II XL Site Directed Mutagenesis Kit (Agilent Technologies). 

RNA Extraction, Reverse Transcription, and cDNA Preparation 

RNA was prepared using Trizol (Invitrogen). cDNA was synthesized using the 
Superscript III First Strand Synthesis System for RT-PCR (Invitrogen). 

Cell Culture and Transfection 

HEK293T cells (ATCC) were cultured in DMEM"^ GlutaMax medium (Life Tech- 
nologies) supplemented with 10% fetal bovine serum (Omega Scientific) and 
grown at 37°C in 5% CO 2 . Cells were transfected using polyethylenimine (Pol- 
ysciences). HUES6 human ES cells were cultured feeder-free on Matrigel- 
coated dishes (BD) using mTeSRTMl (StemCell Technologies) and passaged 
once every 3-4 days using Collagenase type IV enzyme. 

Human NPC Derivation, Growth, and Nucleofection 

NPCs were differentiated from HUES6 cells through embryoid body and 
rosette generation and grown as previously described (Marchetto et al., 
2010). Plasmid delivery into human NPCs was performed by nucleofection 
(Lonza/Amaxa Nucleofector, kit VPG-1005). 

In Vitro Translation 

ORFO was synthesized in vitro by employing the TNT Coupled Reticulocyte 
Lysate System (Promega) using T7 polymerase. 

Cell Extracts and Western Blot Analysis 

Cells were harvested 2 days post transfection, washed with cold DPBS, and ly- 
sates were prepared with ice cold RIPA lysis buffer (50 mM Tris-HCI [pH 7.4], 
150 mM NaCI, 0.25% deoxycholic acid, 1% NP-40, 0.1% SDS, and 1 mM 
EDTA) containing complete protease inhibitor cocktail with EDTA (Roche) and 
1 mM DTT. Lysates were incubated on ice for 15 min, spun at 14,000 x g for 
15 min at 4°C, and the supernatants were collected. Primary antibodies: rabbit 
a-GFP (1:2000, Santa Cruz sc-8334), rat a-HA peroxidase high-affinity 3F10 
(1 :1 000, Roche), and a-ORFO (1 :300). Secondary antibody: (1 :5,000, GE NA934). 

Fluorescence Detection 

Cells were grown in poly-L-lysine (Sigma) coated 2-well LabTek chamber 
slides (Nunc, Fisher), fixed in 4% paraformaldehyde (Sigma) for 15 min at 
room temperature, and washed with TBS. The nuclei were stained with DAPI 
(1:1,000, Sigma) and the slides were mounted using polyvinyl alcohol with 
DABCO (Sigma). 

Computational Analyses 

Detection and visualization of ORFO loci in human and chimp genomes: the 
UCSC genome browser and EnsembI databases were used to retrieve poten- 
tial ORFO coding sequences, which were subsequently in silico translated. The 
EnsembI databases (hg19, panTro4) were used for blastn, allowing some local 
mismatch but no gap to obtain ORFO loci. An alternative method of retrieving 
all potential full-length ORFO sequences from RepeatMasker was tested and 
led to very similar results. Custom python scripts and EMBOSS suite (Rice 
et al., 2000) were used for identification and characterization of ORFO loci, 
full-length as well as untruncated-until-splice-donor, in the genome. Se- 
quences that did not contain a GT dinucleotide at the splice donor site were 
removed. EnsembI Karyotype View tool was used for visualization of the 
ORFO loci. Upon confirmation of an annotation error in the Chimp Chr 2B, 
the erroneous fragment was removed from the image. The removed region 
contained no genes or TEs. Analysis of RNA-seq datasets: RNA-seq (human 
and chimp IPS cells) data from GEO: GSE47626 (Marchetto et al., 2013), 
GEO: GSE44646 (Wang et al., 2014), GEO: GSE60996 (Gallego Romero 
et al., 2015), and ArrayExpress: E-MTAB-2031 (Chan et al., 2013); CAGE 
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(capped 5' RNA-seq) data from GEO: GSE34448 (Djebali et al., 2012); Ribo- 
seq (ribosome footprinting) data from GEO: GSE32060 (Shaigi et ai., 2013) 
were anaiyzed from raw FASTQ fiies in a consistent manner. Reads were 
aiigned to the reference human (hg19) and chimpanzee (panTro4) genomes 
with STAR, which is capabie of identifying novei spiice junctions (Dobin 
et ai., 2013). Spiiced ORFO reads were identified by fiitering out aii muitimap- 
pers and oniy considering reads originating from an ORFO iocus (direct overiap 
of 5' end for stranded RNA-seq and direct overiap of either read end for un- 
stranded RNA-seq). Read distributions aiong LI were found by aiigning reads 
to the consensus L1HS eiement using STAR (Dobin et ai., 2013). Read den- 
sities aiong the + and - strands were further normaiized based on the totai 
number of reads in each experiment that were aiignabie to the fuii genome. Ra- 
tios of isoforms (ORFO versus totai) were determined by comparing the spiice 
junction reads (j) of ORFO, j(0c), to j(ab), j(bc), j(cd), j(de), where the order of 
exons are a-b-O-c-d-e: ratio = average((j(Oc)/(average(j(ab),j(bc)) + j(0c))), 
(j(Oc)Zaverage(j(cd) + j(de)))). This aiiowed us to get a more reiiabie estimate 
compared to caicuiations that reiy soieiy on ratio at one exon j(0c) and j(bc) 
and to reduce the 3' bias that is observed in poiyA-based sequencing, in the 
few cases where the ratio is higher than 1 (maximum being 1.2), these ratios 
are presented as 1 in the tabies (Figures 30, 3D, and Tabie S2). Genes in the 
tabies went through further manuai inspection. Proteomic database genera- 
tion: RNA-seq reads from human iPS/ES ceiis were assembied using Cuffiinks, 
ORFO containing transcripts were seiected and redundancies were removed, 
in paraiiei, ORFO-containing mRNAs that are either ESTs or annotated tran- 
scripts were added to the RNA-seq iist. The combined iist was in siiico trans- 
iated and appended to the current human Uniprot database for spectra 
searches. Determination of species that have ORFO ioci: LIHS/LIPt 
consensus ORFO sequence (identicai) was used in Biat and biast searches 
to determine the genomes that contain ORFO ioci. The absence of ORFO ioci 
in non-Catarrhine primates was further confirmed by in siiico transiation of 
LI sequences (with repeat start <1 ,000) and subsequent search for ioci that 
can encode a poiypeptide with > = 50% identity to ORFO protein (FL or 
SD1-ORF0). Generation of consensus primate ORFO sequences for phyioge- 
netic anaiysis: ORFO ioci that can encode an untruncated protein >210 nucie- 
otides were retrieved via biast searches and subsequent in siiico transiation 
and fiitering. The sequences were trimmed to 213 nucieotides (iength of FL 
ORFO) and used in moiecuiar phyiogenetic anaiysis. Basic moiecuiar phyioge- 
netic anaiysis: Ciustai Omega was used to generate the aiignments. The evoiu- 
tionary history was inferred by using the maximum iikeiihood (ML) method 
based on the JTT matrix-based modei. The tree with the highest iog iikeiihood 
is shown. A totai of 1 ,000 bootstrap repiicates were used for test of phyiogeny. 
The percentage of trees in which the associated taxa ciustered together is 
shown next to the branches, initiai tree(s) for the heuristic search were ob- 
tained automaticaiiy by appiying Neighbor-Join and BioNJ aigorithmsto a ma- 
trix of pairwise distances estimated using a JTT modei and then seiecting the 
topoiogy with superior iog iikeiihood vaiue. The tree is drawn to scaie, and 
branch iengths represent number of substitutions per site. The anaiysis 
invoived amino acid sequences and aii positions with <95% site coverage 
were eiiminated. Muscie generated aiignments as weii as maximum parsimony 
anaiysis generated a very simiiar tree. Evoiutionary anaiyses were conducted 
in MEGA6 (Tamura et al., 2013). Analyses using RAxML and PhyML as well as 
neighbor joining methods resulted in very similar trees. DNAand protein logos 
were generated using WebLogo (Crooks et al., 2004). 

L1 Mobility Assays 

Luciferase-based LI mobility reporters used were previously described (Xie 
et al., 2011). Cells were transfected/nucleofected with experimental con- 
structs together with LI mobility reporter plasmid pYX017. Luciferase activity 
was quantified at day 3 using the Dual Luciferase Reporter 1 000 Assay System 
(Promega, El 980) and a Perkin Elmer Victor X Luminometer. A two-tailed t test 
was used for statistical analysis. 

Promoter Activity Assays 

Promoter activity was measured by co-transfecting ORFO promoter con- 
structs cloned into pGL4.10 (Promega) along with the normalization vector 
phRLTK (Promega). Activity was measured after 2 days, as in the LI activity 
assays. A two-tailed t test was used for statistical analysis. 



Antibody Generation and Immunoprecipitations 

Peptides corresponding to ORFO amino acid residues 20-34, 33-49 and 50-65 
in the LI HS consensus were synthesized, conjugated to KLH and used in gen- 
eration of rabbit polyclonal antibodies (Covance). For immunoprecipitations 
(IPs), cells were washed with DPBS, collected, and frozen. Cell pellets were 
thawed in mDm lysis buffer (25 mM Tris [pH 7.5], 150 mM NaCI, 1.5 mM 
MgCl2, 1 % Triton X-100, 1 mM DTT, protease inhibitors [Roche]) and superna- 
tant from a 15,000 x g 15 min spin was used in IPs. Control and ORFO anti- 
bodies were conjugated to magnetic beads (Pierce). IP duration was 4-6 hr, 
washes were done with the mDm buffer, and beads were heated to 95° C for 
10-12 min for elution. 

Proteomic Sample Prep and Analysis 

Samples were precipitated by methanol/chloroform. Dried pellets were dis- 
solved in 8 M urea/100 mM Tris, [pH 8.5]. Proteins were reduced with 5 mM 
tris(2-carboxyethyl) phosphine hydrochloride (TCEP, Sigma-Aldrich) and alky- 
lated with 10 mM iodoacetamide (Sigma-Aldrich). Proteins were digested 
overnight at 37°C in 2 M urea/100 mM Tris, [pH 8.5], with trypsin (Promega). 
Digestion was quenched with formic acid, 5% final concentration and a final 
volume of 50 ^il. 

The digested samples were analyzed on a Fusion Orbitrap tribrid mass 
spectrometer (Thermo). Samples were analyzed with injections of 8 |al of the 
protein digest per LC/MS run. The digest was injected directly onto a 
40-cm, 75-|am ID column packed with BEH 1.7 |im Cl 8 resin (Waters). Sam- 
ples were separated at a flow rate of 200 nl/min on a nLC 1 000 (Thermo). Buffer 
A and B were 0.1% formic acid in water and acetonitrile, respectively. Two 
reverse phase gradients of 140 min and 450 min were used to maximize sam- 
pling efficiency of the digest. Ninety percent buffer B was used for 1 0 min final 
washes at the ends of gradients. Column was re-equilibrated with 20 |al of 
buffer A prior to the injection of sample. Peptides were eluted directly from 
the tip of the column and nanosprayed into the mass spectrometer by applica- 
tion of 2.5 kV voltage at the back of the column. The Orbitrap Fusion was oper- 
ated in a data-dependent mode. Full MS^ scans were collected in the Orbitrap 
at 120 K resolution with a mass range of 400-1 ,600 m/z and an AGC target of 
5e®. The cycle time was set to 3 s, and within this 3 s the most abundant ions 
per scan were selected for CID MS/MS in the ion trap with an AGC target of 1 e"^ 
and minimum intensity of 5,000. Maximum fill times were set to 50 ms for MS 
scans and 100 and 35 ms for MS/MS scans in the 140 min and 450 min 
methods, respectively. Quadrupole isolation at 1 .6 m/z was used, monoiso- 
topic precursor selection was enabled and dynamic exclusion was used 
with an exclusion duration of 5 s. 

Protein and peptide identification were done with Integrated Proteomics 
Pipeline- IP2 (Integrated Proteomics Applications). Tandem mass spectra 
were extracted from raw files using RawConverter and searched with 
ProLuCID against ORFO-human UniProt database. The search space included 
all fully tryptic and half-tryptic peptide candidates. Carbamidomethylation on 
cysteine was considered as a static modification. Data were searched with 
50 ppm precursor ion tolerance and 500 ppm fragment ion tolerance. Data 
were filtered to 1 0 ppm precursor ion tolerance post search. Identified proteins 
were filtered using DTASelect (Tabb et al., 2002) and utilizing a target-decoy 
database search strategy to control the false discovery rate to 1 % at the 
protein level. 

Imaging 

All imaging was carried out using a Zeiss LSM 780 Confocal Microscope. Im- 
ages were taken using either a 20x or a 100 x oil objective. The z stack inter- 
vals were 1 lam. Image analysis was performed with ZEN (Zeiss) and Imaris 
(Bitplane). Both PML and ORFO foci were identified using the Spots object 
on Imaris (Bitplane) using a fixed spot size of 0.5 |am (the measured average 
XY diameter of nuclear bodies). 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes four figures, two tables, and one movie 
and can be found with this article online at http://dx.doi.0rg/IO.IOI6/j.cell. 
2015.09.025. 
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• Promiscuous variants are abundant in sequence space and 
connected to specific variants 



Aakre et al., 2015, Cell 163, 1-13 
October 22, 201 5 ©201 5 Elsevier Inc. 
http://dx.d 0 i. 0 rg/l 0.1 01 6/j.cell.201 5.09.055 



CelPress 




Please cite this article in press as: Aakre et al., Evolving New Protein -Protein Interaction Specificity through Promiscuous Intermediates, Cell 
(201 5), http://dx.d 0 i. 0 rg/l 0. 1 01 6/j.cell.201 5.09.055 



Article 



Cell 



Evolving New Protein-Protein Interaction 
Specificity through Promiscuous Intermediates 

Christopher D. Aakre, ^ Julien Herrou,^ Tuyen N. Phung,^ Barrett S. Perchuk,^ Sean Crosson,^ and Michael T. Laub^’^ * 

■'Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 
^Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 
^Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637, USA 
*Correspondence: laub@mit.edu 
http://dx.d 0 i. 0 rg/l 0.1 01 6/j.cell.201 5.09.055 



SUMMARY 

Interacting proteins typically coevolve, and the 
identification of coevolving amino acids can pinpoint 
residues required for interaction specificity. This 
approach often assumes that an interface-disrupting 
mutation in one protein drives selection of a compen- 
satory mutation in its partner during evolution. 
However, this model requires a non-functional inter- 
mediate state prior to the compensatory change. 
Alternatively, a mutation in one protein could first 
broaden its specificity, allowing changes in its part- 
ner, followed by a specificity-restricting mutation. 
Using bacterial toxin-antitoxin systems, we demon- 
strate the plausibility of this second, promiscuity- 
based model. By screening large libraries of interface 
mutants, we show that toxins and antitoxins with 
high specificity are frequently connected in sequence 
space to more promiscuous variants that can serve 
as intermediates during a reprogramming of interac- 
tion specificity. We propose that the abundance of 
promiscuous variants promotes the expansion and 
diversification of toxin-antitoxin systems and other 
paralogous protein families during evolution. 

INTRODUCTION 

Many interacting proteins within the same cell, particularly 
signaling proteins, are members of large paralogous families 
that have expanded through duplication and divergence. To 
expand in number, paralogous interacting proteins typically 
must become specific after duplication to avoid unwanted 
cross-talk (Capra et al., 2012; Zarrinpar et al., 2003). The speci- 
ficity determinants of protein-protein interactions remain poorly 
defined in most systems. Even in the cases where they have 
been identified, we lack a detailed understanding of how a 
new, insulated protein-protein interaction emerges during the 
course of evolution and, more generally, the mutational paths fol- 
lowed during protein evolution (DePristo et al., 2005). 

Computational studies demonstrate that interacting proteins 
often coevolve. Indeed, identification of coevolving residues 
has helped guide identification of the specificity determinants 
of many protein-protein interfaces (Ovchinnikov et al., 2014; 



Skerker et al., 2008). The implicit notion or underlying model 
behind these analyses is usually that an interaction-disrupting 
mutation in one protein can be rescued by a mutation in its part- 
ner (Figure 1A). This model, which we call the compensatory 
mutation model, implies that the system passes through a non- 
functional or non-interacting state. However, such a state is 
highly unlikely, particularly for a protein-protein interaction that 
is critical for the viability of an organism. Alternatively, the spec- 
ificity of a given protein-protein interaction could change, and 
become insulated from other paralogous systems, if one of the 
proteins passes through a promiscuous intermediate (Figure 1 B). 
In this model, an initial mutation in protein A would broaden its 
specificity, enabling its partner, protein B, to accumulate a muta- 
tion that would have disrupted its interaction with the original, 
ancestral form of protein A. A subsequent mutation in protein A 
would then narrow its specificity to include the derived, but not 
the ancestral, form of protein B. In this promiscuous intermediate 
model, the specificities of the interacting proteins change 
without ever transitioning through a non-functional intermediate 
state. Note that in both models, A and B continue to interact 
through the same set of interfacial residues and do not evolve 
an alternative interface de novo (Kuriyan and Eisenberg, 2007). 

Which of the two models in Figure 1 applies to most pairs of 
interacting proteins is unclear. In each case, the mutational tra- 
jectory involved would produce a signature of pairwise amino- 
acid coevolution in the phylogenetic record. However, only the 
latter, promiscuous intermediate model invokes the existence 
of mutations that are transiently introduced to broaden the spec- 
ificity of one of the two proteins. The prevalence of such promis- 
cuous states is unknown, as is whether they are easily reached 
from more specific, extant states. 

Bacterial toxin-antitoxin (TA) systems provide an excellent 
model system for dissecting the coevolutionary dynamics of pro- 
tein-protein interactions. Originally identified on plasmids, these 
systems are widely found in bacterial chromosomes, with many 
species encoding multiple, paralogous copies that share exten- 
sive similarity at the sequence and structural levels (Leplae et al., 
2011). The biological function of TA systems is unclear, but they 
have been implicated in stress responses, resistance to phage, 
formation of persister cells, and bacterial pathogenicity (Yama- 
guchi et al., 201 1). Typically, the toxin is a stable, globular protein 
that can inhibit cell growth or viability unless antagonized by a 
cognate antitoxin that directly binds and sequesters the toxin. 
Changes in the degradation rate or synthesis of the antitoxin 
can trigger release of the toxin. A toxin is typically encoded in 
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Figure 1. Models for the Evolution of New Protein-Protein Interac- 
tion Specificity 

(A) In a model of coevolution through compensatory mutations, an initial mu- 
tation in protein A that disrupts the A-B interaction is rescued by a compen- 
satory mutation in protein B. Ovals represent the set of protein B variants that 
are bound by protein A, and Xs indicate particular protein B variants. Note that 
the intermediate state is a non-functional interaction. 

(B) In an alternative model for protein coevolution, protein A first accumulates a 
mutation that broadens its specificity, followed by a second mutation in protein 
B that retains its interaction with the new form of A but that would have dis- 
rupted its interaction with the ancestral form of protein A. In a final step, protein 
A mutates to narrow its specificity to include the derived, and not ancestral, 
form of protein B. 



the same operon as an antitoxin, and toxin-antitoxin paralogs 
frequently arise through operon duplications. An unresolved 
question is whether toxin-antitoxin systems interact in an exclu- 
sive one-to-one manner. Genetic data suggest that these inter- 
actions may be specific (Fiebig et al., 2010), and the growth 
inhibitory effects of a toxin are usually rescued only by express- 
ing its co-operonic antitoxin (Hallez et al., 2010; Ramage et al., 
2009). However, interaction specificity has only been directly 
tested in a limited number of cases, and some groups have sug- 
gested that toxins and antitoxins encoded in different operons 
are capable of interacting in vivo and in vitro, possibly forming 
large, promiscuous networks (Yang et al., 201 0; Zhu et al., 201 0). 

Here, we systematically measure the binding preferences of 
20 ParD-ParE TA family members and find that these toxins 
and antitoxins are highly specific, interacting almost exclusively 
with their partner from the same operon. This specificity is en- 
coded by a small set of coevolving residues at the toxin-antitoxin 
interface, and mutations in these residues are sufficient to repro- 
gram a ParD antitoxin to interact with non-cognate ParE toxins. 
Guided by these findings, we generated a library with ~10"^ var- 
iants of the key, specificity-determining residues in a ParD anti- 
toxin and selected mutants that antagonize the cognate toxin, a 
non-cognate toxin, or both. Strikingly, we find that promiscuous 



variants that antagonize multiple toxins are easily obtained and 
are also highly connected in sequence space to specific variants. 
These results suggest that mutational paths leading to changes 
in toxin-antitoxin specificity are likely to involve promiscuous 
intermediates. Such paths enable the reprogramming of toxin- 
antitoxin specificity through the pairwise coevolution of interfa- 
cial residues, but without passing through an intermediate state 
that disrupts the protein-protein interaction. The abundance of 
promiscuous states likely facilitates the evolutionary expansion 
of these and other paralogous protein families following operon 
and whole-genome duplications during evolution. 

RESULTS 

Toxins and Antitoxins from the ParDE Family Exhibit 
High Interaction Specificity 

To systematically measure the interaction specificity of TA sys- 
tems, we focused on the ParD-ParE family, which is often found 
in multiple copies on bacterial chromosomes (Fiebig et al., 2010; 
Leplae et al., 2011) (Figure S1A). We initially cloned the three 
chromosomally encoded ParD-ParE pairs from the a-proteo- 
bacterium Mesorhizobium opportunistum into vectors that allow 
for separate and inducible expression of the ParE toxin and ParD 
antitoxin. To measure the interaction specificity for these pairs, 
we then co-transformed all pairwise combinations of toxin and 
antitoxin plasmids into E. coli and assessed whether the induced 
expression of each ParD antitoxin rescues the growth arrest re- 
sulting from inducing each ParE. As a control, we first confirmed 
that inducing each ParE toxin inhibited growth of E. coli (Fig- 
ure 2A). Then, plating on a medium that induces both ParD 
and ParE, we observed growth for each of the three cognate 
ParD-ParE pairings (Figure 2A). No growth was observed for 
the six non-cognate pairs, indicating that the ParD antitoxins 
from M. opportunistum can only neutralize their cognate ParE 
toxins. 

We extended this analysis to the 20 chromosomally encoded 
ParDE pairs from eight different bacteria, including the three 
pairs from M. opportunistum (Figure S1 B). For this 20 x 20 ma- 
trix of ParD and ParE pairs we observed strong interactions be- 
tween all 20 co-operonic ParDE pairs, but only 1 1 of the 380 (or 
3%) other possible pairings (Figure 2B). Importantly, these 
cross-reactions were only observed between ParD and ParE 
proteins not encoded in the same species, indicating that the 
ParDE pairs within a given organism are typically insulated 
from one another. These results indicate that ParD antitoxins 
are highly specific for their cognate ParE toxins. 

Identification of Covarying Residues in ParD and ParE 

As a first step in understanding the molecular basis of specificity 
in ParD-ParE complexes, we solved a 1 .59-A cocrystal structure 
of the M. opportunistum ParD3 antitoxin bound to ParE3, its 
cognate toxin. This structure revealed a heterotetrameric asym- 
metric unit composed of ParD3 and ParE3 dimers (Figure S2A), 
similar to a C. crescentus ParD-ParE structure (Dalton and 
Crosson, 2010). Crystal packing and an estimated mass of 
~87 kDa in solution indicate that the biological assembly is 
composed of two tetramers (Figures S2B and S2C). Within this 
complex, each ParD3 subunit makes extensive contacts with a 
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neighboring ParE3 subunit primarily through its second and third 
alpha helices, with a total buried surface area of 1,624 
(Figure 3A). 

Previous work with bacterial two-component signaling sys- 
tems demonstrated that their interaction specificity is controlled 
by a subset of residues at the protein-protein interface formed 
by a histidine kinase and response regulator (Skerker et al., 
2008). These specificity-determining residues coevolve to main- 
tain the interaction between cognate signaling proteins. Thus, to 
pinpoint the residues that contribute to the specificity of ParD- 
ParE interactions, we used GREMLIN, a pseudo-likelihood- 



Figure 2. Toxins and Antitoxins from the 
ParD-ParE Family Exhibit High Interaction 
Specificity 

(A) Testing of interaction specificity for ParD anti- 
toxins and ParE toxins from Mesorhizobium op- 
portunistum. Piasmids harboring the toxins and 
antitoxins indicated were co-transformed into 
E. coli with ParD and ParE induced as indicated. 

(B) Comprehensive testing of interaction specificity 
for 20 ParD and ParE pairs from eight different 
species. Ceiis containing each possibie ParD-ParE 
pair were grown on piates that induce the toxin and 
antitoxin, respectiveiy, and grown overnight at 
37°C. Yeiiow, visibie coionies foiiowing seriai 
diiution; biack, no visibie coionies. 

See Figure SI . 



based model for coevolution (Kamisetty 
et al., 2013; Ovchinnikov et al., 2014), to 
search for residues that strongly covary 
in a multiple sequence alignment of 
concatenated, co-operonic ParD and 
ParE proteins. This analysis identified 10 
residues in ParD and 11 residues in 
ParE that coevolve most strongly. Here- 
after, we call these 21 amino acids 
“specificity” residues, as our work below 
indicates that they play the dominant role 
in determining partner specificity. Map- 
ping these specificity residues onto the 
ParD3-ParE3 crystal structure indicated 
that they cluster into two groups at the 
primary molecular interface formed by 
these proteins (Figures 3B and 3C). The 
first group sits at the base of the second 
alpha helix in ParD3 and covaries with 
residues in the three-stranded beta sheet 
in ParE3. The second group clusters in 
the third alpha helix in ParD3 and cova- 
ries with residues in the first and second 
alpha helices of ParE3. We also used 
GREMLIN to identify residues within 
each protein (four in ParD and six in 
ParE) that coevolve with the specificity 
residues (Figure 3C and S3A). These 
“supporting” residues may indirectly 
contribute to ParD-ParE interaction spec- 
ificity by influencing the orientation or packing of the interfacial 
specificity residues. 

Covarying Residues Dictate Interaction Specificity 
in the ParD-ParE Family 

To determine whether the coevolving residues identified are suf- 
ficient to dictate interaction specificity of the ParD-ParE family, 
we constructed a series of chimeric proteins in which different 
regions of the M. opportunistum ParD3 were replaced with the 
corresponding regions of ParD1 or ParD2 (Figure S3B). Replac- 
ing the entire C-terminal region of ParD3 with the corresponding 
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Figure 3. Covarying Residues Dictate Interaction Specificity in the ParD-ParE Family 

(A) Structure of the M. opportunistum ParDS-ParES complex (PDB: 5CEG). Light orange, ParES monomer; light blue, ParDS monomer. 

(B) A section of the ParDS-ParES structure from (A) magnified; covarying residues shown in space-filling representation. 

(C) Alignment of M. opportunistum ParD and ParE paralogs with coevolving residues highlighted in blue or orange for ParD or ParE, respectively. Supporting 
residues, which coevolve with the interfacial coevolving residues, are highlighted in gray. 

(D) Mutations in the C terminus of ParDS can reprogram interaction specificity. The indicated ParDS mutants were tested against each ParE homolog from 
M. opportunistum using the E. coii toxicity-rescue assay. 

Also see Figures S2 and S3. 



region of ParDl or ParD2 produced a chimera that lost its ability 
to interact with ParES but gained the ability to interact with ParEl 
or ParE2 (Figure 3D). These chimeras involved both clusters of 
interfacial residues identified as coevolving between ParD and 
ParE proteins. Replacing only one of these clusters in the 
ParDS C terminus was sometimes sufficient to reprogram spec- 
ificity, but depended on the toxin tested (Figure S3C). These re- 
sults indicate that the C-terminal region of ParD, which contains 
the specificity and supporting residues, is sufficient to dictate 
interaction specificity. 

To pinpoint the residues required for interaction specificity, we 
focused additional mutagenesis on the coevolving residues 
identified computationally. We generated variants of ParDS in 
which all of the specificity and supporting residues were re- 
placed with the corresponding residues in ParDl or ParD2, for 
a total of 8 or 9 substitutions, respectively. In each case, we 
found that these mutations were sufficient to reprogram ParDS 
to interact with ParEl or ParE2 and lose its ability to interact 



with ParES (Figure 3D). Interestingly, ParDS could be reprog- 
rammed to interact with ParEl or ParE2 with fewer substitutions. 
For example, we found sets of four substitutions that were suffi- 
cient to reprogram ParDS to interact with ParEl or ParE2 (Fig- 
ure 3D). Taken together, our results indicate that mutating the 
most highly coevolving residues in an antitoxin can be sufficient 
to reprogram its interaction specificity, and, in some cases, 
mutating only a subset of these residues allows a complete 
switch in partner specificity. 

High-Throughput Mapping of Interface Mutant Fitness 

The results presented above indicate that antitoxin interaction 
specificity can be reprogrammed by changing just four residues. 
But how does specificity change as these four individual substi- 
tutions are introduced and does the substitution order matter? 
Does the specificity of antagonizing one ParE toxin to another 
change abruptly, or are there promiscuous mutational intermedi- 
ates? To answer these questions, we sought to generate a large 
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library of ParDS variants that included combinations of residues 
shown to be specific for antagonizing ParES or ParE2, as well as 
the mutational intermediates separating these specific states. To 
this end, we generated a library of mutants at four of the key 
interfacial positions in the ParDS antitoxin, Leu^^, Trp®°, Asp®\ 
and (LWDK). To reduce the complexity of our library, we 
only allowed residues at each library position that are commonly 
found in naturally occurring ParD homologs (see Experimental 
Procedures). The resulting library has a theoretical diversity of 
9,360 variants, with 12, 6, 13, and 10 possible residues encoded 
at the four respective positions of the library (Figure 4A). Deep- 
sequencing of the relevant region in parD3 in the initial library re- 
vealed that >98% of the predicted variants were represented by 
at least 10 reads and >94% had at least 100 reads (Figure S4A). 
Measurements of read numbers were highly reproducible be- 
tween replicates (R^ > 0.99, Figure S4B). 

To assess the ability of each ParD3 variant to bind and antag- 
onize ParE3, we co-transformed E. coli with the ParD3 library 
and an inducible ParE3 vector. When cultured in conditions 
that do not induce ParD3, cell growth arrested within 200 min 
after inducing the ParE3 toxin (Figure 4B). In contrast, when 
the ParD3 library was expressed, growth slowed after inducing 
the toxin but eventually resumed, suggesting that some fraction 
of the population could neutralize ParE3 toxicity (Figure 4B). To 
determine which mutants neutralized ParE3 and hence were 
enriched during the course of this experiment, we harvested 
samples every 100 min and deep-sequenced the relevant re- 
gion of parD3. We observed large changes in the frequency 
of individual variants over this time course (Figure S4C). For 
example, the variant containing the wild-type ParD3 residues 
(LWDK) was enriched ~6-fold, whereas variants with frameshift 
mutations in parD3, which are presumably non-functional, were 
depleted ~7-fold (Figure S4C). To validate the functionality of 
variants inferred from this competitive growth assay, we iso- 
lated six mutants that exhibited different frequency dynamics 
following toxin induction (Figure 4C). We tested these six mu- 
tants individually using our toxicity-rescue assay and found 
clear agreement between the change in the frequency of 
each variant in the library and its individual plating efficiency 
(Figure 4D). 

To quantify differences in variant behavior during competitive 
growth, we generated a linear fit to the frequencies of each 
mutant as a function of time, and then calculated the log-fold 
expansion of each mutant relative to the rest of the population, 
producing a raw fitness value (W^aw) for each mutant. We then 
transformed these raw fitness values such that the W value for 
frameshift variants was 0 and the W value for the wild-type 
(LWDK) sequence was 1 ; the resulting distribution of W values 
ranged from -0.04 to 1.13 and was highly reproducible be- 
tween biological replicates (Figure 4E, = 0.98). We found a 

total of 252 variants with W values > 0.5, representing 2.7% 
of the total (Figure 4F). This set included the wild-type combi- 
nation of residues (LWDK) and 31 single, 189 double, and 31 
triple mutants relative to the wild-type sequence (Figure S4D). 
There were no quadruple mutants, as position 60 was invari- 
antly tryptophan. The most common residues in this set as a 
whole were wild-type. However, the identification of 252 vari- 
ants that can effectively antagonize ParE3 indicates a substan- 



tial degree of functional degeneracy in the ParD3 interfacial 
residues. 

Next, to assess the ability of each ParD3 variant to antagonize 
the non-cognate toxin ParE2, we repeated the competitive 
growth experiment but co-transformed E. coli with our ParD3 li- 
brary and an inducible ParE2 vector. As before, we observed 
growth rescue following ParD3 library expression with large 
changes in the frequency of individual variants over time (Figures 
4G and S4E). However, the frequency changes observed here 
differed from those observed against the cognate toxin ParE3. 
For example, a variant containing the specificity residues found 
in the native ParD2 antitoxin, AWIL, was enriched in the ParD3 
library screened against ParE2 but was depleted when screened 
against ParE3 (Figures 4C and 4H). We quantified variant fitness 
as before and found a total of 151 variants (1.6% of the total) 
capable of antagonizing ParE2 with W values > 0.5 (Figures 4I 
and S4E). The most common residues were Ala^®, Trp®° 
(invariant), Leu®\ and Leu®"^. However, we noted important dif- 
ferences between variants reactive against ParE2 and ParE3, 
particularly at the last two variable positions in our library. 
ParE2-specific variants tended to have small hydrophobic or 
positively charged residues at position 61 , whereas ParE3-spe- 
cific variants favored negatively charged residues at this position 
(Figures 4F and 4I). Additionally, ParE2-specific variants were 
more likely to contain small hydrophobic residues at position 
64, whereas ParE3-specific variants tended to have positively 
charged residues (Figures 4F and 4I). 

Mutational Paths That Reprogram Specificity Tend 
to Invoive Promiscuous Variants 

To more systematically probe the sequence space governing the 
specificity of ParD3, we generated a scatterplot of ParD3 variant 
fitness when screened against the ParE2 or ParE3 toxin (Fig- 
ure 5A). This analysis revealed variants spanning all ranges of 
fitness, including those capable of antagonizing ParE2, ParE3, 
or both toxins simultaneously. We identified a total of 31 promis- 
cuous variants (W > 0.5 for both toxins), which represents a sub- 
set of the 252 ParE3-reactive and 151 ParE2-reactive variants 
(Figure 5B). We then grouped variants by specificity class (Fig- 
ure S5A) and found that the promiscuous variants, such as 
LWEL, tended to harbor sequence elements from both ParD3 
and ParD2, often with negatively charged residues at position 
61 (ParD3-like) and aliphatic residues at position 64 (ParD2- 
like) (Figure 5C). 

To visualize the connectivity of functional variants in sequence 
space, we created a force-directed graph where individual no- 
des represent functional variants with lines connecting variants 
that differ by a single amino acid (Figure 5D). Node sizes increase 
with greater connectivity and node colors represent the speci- 
ficity class of a given variant (Figure 5D). The resulting graph 
was densely interconnected but generally grouped variants 
based on their specificity. The average number of edges per 
node, or degree, was 17.8 and ranged from 7 to 31. However, 
we noted that the average number of edges per node was 
23% higher for promiscuous variants than for variants specific 
for ParE2 or ParE3 (Figure 5E). We also generated a force- 
directed graph in which edges represent variants that differ by 
a single-nucleotide substitution, following the standard genetic 
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Figure 4. High-Throughput Mapping of Mutant Fitness at Co-evolving Interface 

(A) Composition of the ParDS antitoxin library at the four variable positions. 

(B) Library growth following ParES toxin induction. 

(C) Frequency changes over time for the indicated ParDS variants following ParES induction. 

(D) Testing of individual variants from (C) using the toxicity rescue assay. 1 0-fold serial dilutions were plated from cultures expressing the ParDS variant indicated 
and the ParES toxin. 

(E) Two biological replicates of fitness measurements derived from screening the ParDS library against the ParES toxin. 

(F) Frequency logo for ParDS library variants with high fitness against ParES (Wes > 0-5). 

(G) Library growth following induction of the non-cognate ParE2 toxin. 

(H) Frequency changes over time for the indicated ParDS library variants. 

(I) Frequency logo for ParDS library variants with high fitness against ParE2 (We 2 > 0.5). 

Also see Figure S4. 

code (Figure S5B). For this graph, promiscuous variants were, on nectivity of promiscuous variants was highly significant for 
average, 31 % more connected to other nodes than their ParE2- both amino acid and nucleotide graphs, as it was lost when 

or ParE3-specific counterparts (Figure 5E). This increased con- the edges of each graph were randomly shuffled (p < 10“"^, 
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Figure 5. Specificity-Reprogramming Paths Are Highly Enriched for Promiscuous Variants 

(A) Fitness of ParD3 variants against ParE2 and ParE3. Green, specific for ParE3; blue, capable of antagonizing both ParE2 and ParE3; red, specific for ParE2. 
Histograms of fitness values against ParE2 and ParE3 are shown. 

(B) Venn diagram of ParD3 variants reactive against ParE3, ParE2, or both. 

(C) Frequency logo of promiscuous ParD3 variants (We 2 > 0.5, Wes > 0.5). 

(D) Force-directed graph of all ParD3 variants reactive against ParE3 or ParE2 (W > 0.5). Nodes represent individual variants and edges represent single amino- 
acid substitutions. Node size scales with increasing degree and color corresponds to the specificity classes in (A). 

(E) Average number of edges per node for the indicated categories of ParD3 variants. Error bars indicate SEM. 

(F) Examples of “switch-like” and “promiscuity-based” mutational paths from an E3-specific variant to an E2-specific variant with the fitness against each variant 
color-coded based on the scale shown. 

(G) Left, percentage of “switch-like” and “promiscuity-based” paths from the wild-type ParD3 sequence (LWDK)to each of the 66 ParE2-specific variants (We 2 > 
0.5, We 3 < 0.1). Right, same as left panel but for 10,000 simulations in which the graph edges were randomly shuffled while keeping the total edge count and 
degree distribution constant. Error bars represent SEM. 

(H) Histogram representing percentage of “promiscuity-based” paths in 1 0,000 edge shuffling simulations; red line indicates percentage for the observed amino 
acid graph. 

Also, see Figure S5. 
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Figure 6. Mutational Order Dictates Specificity Class of Intermediate Variants 

(A) Mutational paths from LWDK to LWKL for ParD3 with fitness of each variant against ParE2 and ParE3 shown as a heatmap: yellow, high fitness; black low 
fitness. 

(B) The six path types that reprogram ParD3 specificity in two mutational steps. Percentage of mutational paths in each category is indicated for a threshold of 0.5 
used to define a positive interaction. 

Also see Figure S6. 



Figures S5C and S5D). The high connectivity of promiscuous 
variants was even more pronounced with a more stringent defi- 
nition of specificity (Figure S5E). 

The dense connectivity of promiscuous variants suggested 
that mutational paths that change ParD3 specificity (from 
ParE3-specific to ParE2-specific, or vice versa) tend to travel 
through promiscuous intermediates. To test this hypothesis, 
we first defined two types of specificity-reprogramming paths. 
Note that for the following analysis, we exclude paths in which 
ParD3 fails to interact with both ParE3 and ParE2 (also see Dis- 
cussion). The first class of paths are “switch-like” and only 
involve intermediates that are specific for ParE2 or ParE3, 
whereas the second class of paths are “promiscuity-based” 
and travel through at least one intermediate that can inhibit 
both ParE2 and ParE3 (Figure 5F). To determine whether paths 
that change the interaction specificity of ParD3 tend to be 
switch-like or promiscuity-based, we identified all shortest muta- 
tional paths from the wild-type ParD3 variant (LWDK) to each of 
the 66 variants that are highly specific for ParE2 (I/1 /e2 > 0.5, I/I/es < 
0.1 ; Figure S5A); for this analysis, each mutational step involved 
a single amino-acid substitution. We found a total of 370 shortest 
paths, of which 40% involved a promiscuous intermediate (Fig- 
ure 5G). The percentage of paths via promiscuous intermediates 
increased to 61 % when considering only paths that involve sin- 
gle-nucleotide substitutions (Figure 5G). 

To determine whether the number of paths that involve pro- 
miscuous variants is greater than would be expected by chance, 
we generated graphs in which the edges were randomly shuf- 
fled, and again calculated the percentage of each class of paths 
from ParD3 (LWDK) to the ParE2 highly specific variants. For 
these graphs with randomized edges, the percentage of paths 
involving promiscuous intermediates dropped to 15% for the 
amino acid neighbor graph and 20% for the nucleotide neighbor 
graph (Figures 5G and 5H). Thus, the enrichment of promiscuity- 
based paths in the observed graphs is significant (p < 0.005) (Fig- 



ures 5G, 5H, and S5F). Collectively, our results demonstrate the 
dense connectivity of functional variants in the sequence space 
governing ParD-ParE interaction specificity and reveal that 
specificity-reprogramming paths are highly enriched for those 
that involve promiscuous variants, which may facilitate the evo- 
lution of ParD-ParE systems with new specificities. 

Epistasis: Mutational Order Dictates Specificity Class 
of Intermediate Variants 

Inspection of the paths connecting ParD3 variants with different 
specificities indicated that the third and fourth library positions, 
residues 61 and 64 in ParD3, contribute significantly to the insu- 
lation of the ParD-ParE system. For instance, the wild-type res- 
idue combination in ParD3, LWDK, renders it specific for binding 
to ParE3, whereas the double-mutant variant LWKL is specific 
for ParE2. Strikingly, however, the two possible paths connect- 
ing LWDK and LWKL are in different classes (Figure 6A). A single 
ParD3 substitution (K64L in LWDL) resulted in promiscuous 
binding to ParE2 and ParE3, whereas a second substitution in 
this background (D61K in LWKL) resulted in specificity for 
ParE2 (Figure 6A). In contrast, incorporating these substitutions 
in the reverse order, D61 K and then K64L, resulted in a switch- 
like change in specificity in which the initial D61 K substitution re- 
tained specificity for ParE3, but then enabled the subsequent 
K64L substitution to produce a ParE2-specific antitoxin (Fig- 
ure 6A). These results underscore how a small number of muta- 
tions can fully reprogram protein-protein interaction specificity 
and demonstrate that the order of mutations can strongly affect 
whether the path to a new specificity state involves a promiscu- 
ous intermediate or a rapid switch. 

Our finding that changes in specificity can depend strongly on 
the order of substitutions represents a form of epistasis, broadly 
defined as cases where the functional effect of individual substi- 
tutions is context-dependent rather than additive and indepen- 
dent (Lehner, 2011). To more broadly quantify this epistasis for 
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the ParDS interfacial residues, we first defined six types of spec- 
ificity-reprogramming paths that involve two amino-acid substi- 
tutions (Figure 6B). Three of the six path types are epistatic with 
the two intermediates having different specificities, implying that 
substitution order influences changes from ParES to ParE2 spec- 
ificity. We quantified the path type for each case in which two 
substitutions reprogram ParDS from being specific for ParES 
(We 3 > 0.5, We 2 < 0.5) to being specific for ParE2 (We 3 < 0.5, 
We 2 > 0.5) and found a total of 2,65S such cases, of which 
92% were epistatic (Figure 6B). The percentage of epistatic 
paths was robust to the threshold used for defining positive inter- 
actions (Figures S6A and S6B). Taken together, our results high- 
light the pervasive effects of epistasis on ParD function. Although 
studies of epistasis typically consider the interdependence of in- 
dividual substitutions with respect to protein folding or a single- 
protein function (Kondrashov and Kondrashov, 2015; Lehner, 
2011), our findings indicate that epistasis can also manifest at 
the level of interaction specificity. This form of epistasis may 
significantly impact the evolution of new ParD-ParE systems. 
Promiscuous intermediates enable a change in protein-protein 
interaction specificity without passing through a non-functional 
state, in which a liberated toxin would suppress growth and pro- 
liferation (Figure 1A). Thus, the epistasis documented here may 
fundamentally restrict mutational trajectories during evolution 
to those involving promiscuous intermediates. 

Mutational Trajectories to an Orthogonal ParD3-ParE3 
Pair 

Thus far, we have considered changes to one side of the ParD- 
ParE interface. To probe how the interaction specificity of a 
ParD-ParE protein pair coevolves, we sought to generate a 
variant of the toxin ParES that does not interact with ParDS, 
and then select ParDS variants from our library that can 
neutralize this novel toxin. To this end, we generated a variant 
of the toxin, called ParES*, that retains toxicity but is incapable 
of binding to the ParDS antitoxin. In particular, we mutated five 
ParES positions (Arg®'*, Arg®®, Ala®\ Met®®, and Leu^®, or 
RRAML) that strongly covary with the specificity residues in 
ParDS. We mutated RRAML ^VEIRF, as each individual variant 
residue was frequently observed in ParES homologs and was 
chemically different from the corresponding wild-type residue 
(Figure S7A). As expected, we found that ParES* retained toxicity 
but was no longer neutralized by ParDS (Figure 7A). 

To determine whether variants in the ParDS library neutralized 
ParES*, we performed a competitive growth experiment 
following co-transformation. As before, we converted changes 
in variant frequencies to fitness values, which were highly repro- 
ducible (R^ = 0.96, Figure S7B). Sequence analysis of the high- 
fitness mutants (W > 0.5) revealed large differences in amino- 
acid preferences at positions 60 and 61 relative to those shown 
above (Figures 4F and 7B). In particular, for the ParDS variants 
that neutralized ParES*, the invariant Trp®° was replaced by 
lle/Val/Leu and the strong preference for a negatively charged 
residue at position 61 was replaced by positively charged or 
neutral residues (Figures 4F and B). One of the high-fitness var- 
iants with specificity residues LIAK, renamed ParDS*, no longer 
neutralized ParES but robustly interacted with ParES* (Figure 7C). 
Taken together, our results indicate that mutations in the speci- 



ficity residues of ParDS and ParES are sufficient to create an 
orthogonal, interacting protein pair. 

Our results indicate that mutational paths leading to a change 
in ParD specificity tend to pass through promiscuous intermedi- 
ates (Figure 5). Thus, we wanted to determine whether muta- 
tional paths between the wild-type ParDS-ParES and the orthog- 
onal ParDS*-ParES* systems also pass through promiscuous 
intermediates, thereby changing the specificity of both proteins 
without disrupting their interaction. We therefore generated var- 
iants of ParES containing all possible subsets of the substitutions 
in ParES* (S2 mutants) and variants of ParDS containing all 
possible subsets of the substitutions in ParDS* (4 mutants). We 
then co-transformed each possible pairing of ParDS and ParES 
variants (128 pairs total) into E. coli and assessed interaction us- 
ing the toxicity-rescue assay (Figure 7D). Interestingly, 90 of the 
1 28 pairs of ParDS and ParES variants were capable of interact- 
ing, likely because most (1 7 of S2) of the ParES variants were pro- 
miscuous, which we define as interacting strongly with both 
ParDS and ParDS* (Figure 7D). 

To determine whether paths between the wild-type and insu- 
lated ParD-ParE pairs tend to pass through promiscuous inter- 
mediates, we first enumerated the total number of trajectories 
between these systems. Assuming one residue is changed per 
step and no reversions are considered, there are 5,040 paths 
from ParDS-ParES to the orthogonal ParDS*-ParES* pair; of 
these paths, 1,0S0 retain functionality at each intermediate 
step. Strikingly, we found that all of these 1 ,0S0 functional paths 
passed through at least one promiscuous intermediate of ParES 
with an average of five promiscuous ParES intermediates per 
path (Figure S7C). The prevalence of these promiscuous states 
may enable the ParD-ParE system to readily evolve a new inter- 
action specificity. An initial broadening of ParES specificity en- 
ables the movement of ParDS in sequence space, followed by 
a narrowing of ParES specificity in the final step (Figure 7E). By 
contrast, mutational paths in which a substitution in either 
ParD or ParE yields a “switch-like” change in specificity would, 
by definition, be broken until a second substitution restores the 
interaction. Thus, our results support the notion that the coevo- 
lution and expansion of the ParD-ParE family occurs through 
promiscuous intermediates. 

DISCUSSION 

Mutational Trajectories and the Coevolution 
of Protein-Protein Interactions 

Interacting proteins coevolve, and the identification of coevolv- 
ing amino acids in two proteins can often help to pinpoint the res- 
idues that mediate their interaction. Such analyses are typically 
predicated on the idea that a mutation in one protein that dis- 
rupts an interaction then drives selection of a compensatory mu- 
tation in the partner, thereby restoring the interaction (Figure 1 A). 
However, this model implies that organisms tolerate (at least 
transiently) a non-functional, or less functional, interaction, 
which seems unlikely if the protein-protein interaction is essential 
for viability. Our results provide a solution to this conundrum, 
demonstrating experimentally how interacting proteins can 
coevolve and acquire new specificity by having one of the pro- 
teins pass through a promiscuous intermediate (Figure 1B). For 
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Figure 7. Mutational Trajectories to an Orthogonal ParD3*-ParE3* Pair 

(A) ParE3* is insulated from antitoxin ParD3. A plasmid containing either ParE3 or ParE3* was co-transformed into E. coli with a plasmid expressing ParD3, and 
cells were plated on medium that induces or represses expression of the toxin and antitoxin. 

(B) Frequency logo for ParD3 library variants with high fitness against ParE3* (Wes* > 0-5). 

(C) ParE3*-ParD3* is insulated from the wild-type ParD3-ParE3 pair. 

(D) Toxicity-rescue interaction assays for all ParD3 and ParE3 mutant combinations. Top left, wild-type ParD3-ParE3 pair; bottom right, orthogonal ParD3*- 
ParE3* pair. Promiscuous ParE3 intermediates are those capable of interacting with both ParD3 and ParD3*. 

(E) Example of a series of single substitutions that lead to the insulated ParE3*-ParD3* system while retaining the toxin-antitoxin interaction at each step by first 
expanding the specificity of ParE3, followed by changes in ParD3, and finally by restricting the specificity of ParE3. 

Also see Figure S7. 



instance, a mutation in an antitoxin can initially broaden its spec- 
ificity; the toxin can then accumulate a mutation that moves it in 
sequence space but retains its interaction with the antitoxin. A 
subsequent substitution in the antitoxin can then narrow its 
specificity to include the mutated toxin and exclude the original 
form. The net result is a change in specificity without disruption 
of the protein-protein interaction, which is critical as a disruption 
at any step would liberate a toxin that prevents growth and pro- 
liferation. This model for protein coevolution involves a minimum 
of three instead of two mutations but means that the protein-pro- 
tein interaction is functional at each step. Thus, such mutational 



trajectories could be entirely neutral but importantly would 
retain a pairwise-coevolution signature in multiple sequence 
alignments. 

Our systematic identification of ParD3 variants that can antag- 
onize ParE3, ParE2, or both revealed an abundance of promiscu- 
ous variants in sequence space that are, on average, more highly 
connected to other functional variants than are specific variants. 
Consequently, the mutational trajectories that reprogram the 
specificity of ParD3 frequently involve promiscuous intermedi- 
ates (Figures 5F and 5G). The high frequency of mutational 
paths involving promiscuous intermediates was seen when 
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considering transitions in ParDS from being specific for ParES to 
specific for ParE2, and even more so when considering muta- 
tions on both sides of the interface. We assessed the complete 
set of mutational trajectories between the wild-type ParDS- 
ParES and the orthogonal ParD3*-ParE3* by testing 1 28 pairwise 
interactions between all possible ParD3 and ParE3 mutational in- 
termediates. Strikingly, 17 of the 32 ParE3 intermediate variants 
were promiscuous, or capable of interacting with both the ParD3 
and ParD3* variants (Figure 7). Consequently, all of the functional 
paths between ParD3-ParE3 and ParD3*-ParE3* involved at 
least one promiscuous intermediate, with most involving more 
than five (Figure 7). Our results thus suggest that promiscuous 
variants of ParD and ParE are abundant in sequence space 
and that promiscuity-enabling mutations can facilitate the evolu- 
tion of new interaction specificities while still using the same set 
of interfacial residues. 

A similar principle may apply to other protein-protein interac- 
tions throughout biology, even those not involving toxic proteins. 
The disruption of a given protein-protein interaction could pre- 
vent the execution of an essential cellular function or lead to an 
unwanted, detrimental interaction with another protein, thus fa- 
voring coevolutionary trajectories that retain function at each 
step. This same principle may also underlie the coevolution of 
transcription factors and their DNA binding sites. The evolu- 
tionary history of a steroid hormone receptor and its recognition 
element was recently reconstructed including the analysis of a 
possible ancestral state of the steroid receptor and mutational 
intermediates separating it from extant states (Anderson et al., 
2015). Several of the intermediates were promiscuous and may 
have facilitated coevolution of the receptor and its recognition 
element toward a new specificity without disrupting the interac- 
tion. However, that study only considered mutational intermedi- 
ates containing residues present in the ancestral or derived 
states, and our analyses of the ParD-ParE interface suggest 
that promiscuous intermediates can also involve substitutions 
that appear in neither the ancestral nor the derived states. 

Like many protein families, toxin-antitoxin systems can 
expand through duplication and divergence. The duplication of 
a toxin-antitoxin system could allow one of the protein pairs to 
wander unconstrained in sequence space toward a new interac- 
tion specificity via switch-like paths that involve non-functional 
intermediates. After a duplication, one antitoxin could accumu- 
late interaction-disrupting substitutions while its toxin is still in- 
hibited by the other antitoxin. The toxin could then subsequently 
mutate to restore an interaction with the derived antitoxin. How- 
ever, this scenario assumes that the evolving antitoxin does not, 
in the intermediate state, interact inappropriately with other pro- 
teins, and it assumes that the other antitoxin is produced at suf- 
ficiently high levels to inhibit 2-fold more toxin, i.e., that there is 
normally a significant excess of free antitoxin, which may not 
be the case. Determining whether and when switch-like or pro- 
miscuous paths are followed will require careful reconstructions 
of toxin-antitoxin evolution. 

High-Throughput Mapping of Protein Interaction 
Specificity 

Deep mutational scanning via next-generation sequencing is a 
relatively new approach for interrogating the relationship be- 



tween protein sequence and function, including folding, enzy- 
matic activity, or the binding of a target protein or RNA (Fowler 
and Fields, 2014). These studies have begun to reveal the func- 
tional degeneracy of proteins by examining all, or nearly all, 
possible single mutants of a given protein. Similar approaches 
have also been used to probe subsets of all possible double 
and higher-order mutants (Melamed et al., 2013) or to systemat- 
ically probe all possible mutants at a limited set of positions 
(Podgornaia and Laub, 2015). 

Deep mutational scans have been focused primarily on how 
mutations alter a single function or protein interaction. One study 
examined the ability of a PDZ domain to interact with both a 
cognate and non-canonical peptide ligand (McLaughlin et al., 
201 2), but only queried single-point mutants. However, the inter- 
action specificity of a protein is a distributed property of multiple 
amino acids, and the prevalence of epistasis means that the 
behavior of multiple mutations is difficult to infer from the prop- 
erties of the corresponding single mutants. We queried a diverse 
library of ParD3 variants harboring multiple mutations of key 
specificity residues against two separate proteins: the cognate 
toxin ParE3 and the non-cognate toxin ParE2. This focused li- 
brary approach was possible as the specificity of ParD is largely 
determined by a small number of interfacial residues (Figure 3). 
Our approach yielded a high-density map of the sequence 
space of ParD3 that underpins its substrate interaction speci- 
ficity (Figures 5A-5D). From these data, we uncovered the resi- 
dues in ParD3 most responsible for its selective binding of one 
toxin over another (Figures 4F and 4I). We found that three posi- 
tions (60, 61 , and 64) primarily dictate specificity, with substitu- 
tions at two sites (61 and 64) sufficient to switch ParD3 from 
antagonizing ParE3 to ParE2, and substitutions at an overlap- 
ping set of sites (60 and 61) sufficient to switch ParD3 from 
antagonizing ParE3 to ParE3*. As noted, our results also demon- 
strated the existence of many residue combinations that pro- 
mote a promiscuous state of ParD3 or ParE3. Mutations that 
render proteins more promiscuous, with respect to catalytic ac- 
tivities or binding partners, has been noted anecdotally (Aharoni 
et al., 2005; Bloom and Arnold, 2009), but the prevalence of such 
states and, importantly, their accessibility from more specific, 
wild-type states has never been mapped in a comprehensive 
manner. 

By building and screening libraries harboring multiple muta- 
tions, our work also sheds new light on protein epistasis and 
the non-additive relationship of individual substitutions. Epis- 
tasis has been well documented but is typically assessed with 
respect to a single-protein function. By contrast, the epistasis 
documented here for ParD3 pertains to its specificity and inter- 
action with two different proteins, revealing interdependencies 
that would be missed when considering only a single function. 
For instance, consider the example in Figure 6A where ParD3 
transitions from the E3-specific residues LWDK to the E2-spe- 
cific residues LWKL. With respect to antagonizing the toxin 
ParE3, the two single mutants, LWDL and LWKK, are each func- 
tional. However, with respect to toxin ParE2, LWDL is functional 
whereas LWKK is not, reflecting a non-additive relationship be- 
tween the two substitutions leading to the double mutant 
LWKL. This type of epistasis may, like other forms of epistasis, 
restrict the evolution of ParD-ParE systems, which likely follows 
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mutational paths that involve promiscuous states, as discussed 
above. 

Interaction Specificity of Toxin- Antitoxin Systems 

The specificity of interactions in bacterial toxin-antitoxin systems 
had previously been unclear, with some reports indicating that 
these protein-protein interactions are specific (Fiebig et al., 
2010) and others suggesting that TA systems form large, 
cross-reactive networks (Yang et al., 2010; Zhu et al., 2010). 
Here, by performing a systematic assessment of interaction 
specificity for a TA family, we found that ParD antitoxins typically 
exhibit an exquisite preference for binding to their co-transcribed 
ParE toxins, forming exclusive, cognate pairs. Of 180 non- 
cognate pairings tested, we found cross-talk in only 1 1 cases 
(Figure 2) and, importantly, no cross-talk was observed for 
non-cognate pairs present in the same species. 

The high degree of protein-protein interaction specificity 
observed for the ParD-ParE family is similar to that observed 
for other large, paralogous protein families (Newman and Keat- 
ing, 2003; Skerker et al., 2008; Stiffler et al., 2007; Zarrinpar 
et al., 2003). The specificity of many of these paralogous families 
has been attributed to selection against detrimental cross-talk 
(Capra et al., 2012; Zarrinpar et al., 2003), raising the possibility 
that the ParD-ParE family may be under similar selective pres- 
sures. However, the biological rationale for maintaining the 
specificity of TA systems is unclear, and will require a deeper un- 
derstanding of the function of these systems in bacterial 
physiology. 

Final Perspective 

In sum, our work provides a rationale and molecular basis for 
how protein interaction specificity can change and how two pro- 
teins can coevolve without involving non-functional intermedi- 
ates. Mutations that produce promiscuity have been described 
for a variety of proteins, but the frequency of such mutations 
and their accessibility from more specific states had been un- 
clear. Our results indicate that, at least for ParD3 and likely other 
proteins, promiscuous mutants are prevalent and easily reached 
from the wild-type sequence through a single mutation. The 
prevalence of promiscuous intermediates may facilitate the 
expansion of toxin-antitoxin systems and, more broadly, other 
paralogous protein families. 

EXPERIMENTAL PROCEDURES 
ParD3-ParE Structure Analysis 

For details on the structural analysis of M. opportunistum ParDS and ParES, 
see Supplemental Experimental Procedures. 

Identification of Coevolving Residues 

Coevolving residues in the ParDE family were identified using GREMLIN at 
http://gremlin.bakerlab.org. Input sequences were ParDS and ParES from 
M. opportunistum, and we set the number of iterations to four and the E-value 
cutoff to 1 E-04. To identify specificity residues, we isolated all residue pairings 
that had a scaled coupling score greater than 1 .25. To identify supporting res- 
idues, we performed the following iterative procedure using a score cutoff of 
1 .25: (1) identify residues within ParD or ParE that covary with the specificity 
residues; (2) identify residues within ParD or ParE that covary with either the 
specificity residues or the supporting residues identified in step (1); (3) repeat 
step (2) until no new supporting residues are identified. 



ParDS Library Construction and Anaiysis 

For details on construction of the ParDS library, see the Supplemental Exper- 
imental Procedures. To assess the ability of each ParDS variant to antagonize 
different ParE toxins, E. coli cells harboring the ParDS plasmid library were 
electroporated with a plasmid containing an arabinose-inducible copy of the 
ParE toxin. Cells were grown out overnight in 200 ml MOL supplemented 
with 0.4% glucose and antibiotics. The following day, cells were spun down, 
washed in 50 ml of MOL, and re-suspended at an OD of 0.03 in 500 ml of 
MOL supplemented with 100 |iM IPTG (to induce the ParDS library) and antibi- 
otics. Cells were grown out at 37°C with shaking for 100 min, and then ParE 
toxin expression was induced by the addition of 0.2% arabinose. Cell density 
was measured every 20 min and samples (50 ml) were taken every 100 min, 
pelleted, and frozen at -20°C. Competitive liquid growth assays were per- 
formed in duplicate. Plasmid DNA was extracted and used as template for 
PCR (20 cycles) with custom barcoded primers containing lllumina flowcell 
adaptor sequences. Samples were sequenced on an lllumina HiSeq and 
then filtered, counted, and converted to fitness values as described in the Sup- 
plemental Experimental Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, and one table and can be found with this article online at 
http://dx.d 0 i. 0 rg/l 0.101 6/j.cell.201 5.09.055. 
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SUMMARY 

Type VI secretion (T6S) influences the composition of 
microbial communities by catalyzing the delivery of 
toxins between adjacent bacterial cells. Here, we 
demonstrate that a T6S integral membrane toxin 
from Pseudomonas aeruginosa, Tse6, acts on target 
cells by degrading the universally essential dinucleo- 
tides NAD"^ and NADP"^. Structural analyses of Tse6 
show that it resembles mono-ADP-ribosyltransfer- 
ase proteins, such as diphtheria toxin, with the 
exception of a unique loop that both excludes pro- 
teinaceous ADP-ribose acceptors and contributes 
to hydrolysis. We find that entry of Tse6 into target 
cells requires its binding to an essential house- 
keeping protein, translation elongation factor Tu 
(EF-Tu). These proteins participate in a larger assem- 
bly that additionally directs toxin export and provides 
chaperone activity. Visualization of this complex by 
electron microscopy defines the architecture of a 
toxin-loaded T6S apparatus and provides mecha- 
nistic insight into intercellular membrane protein de- 
livery between bacteria. 

INTRODUCTION 

Bacteria utilize a diverse group of secreted toxins to establish 
and defend their niche. Among these are the effectors exported 
by the type VI secretion system (T6SS), which are delivered to 
target cells in a contact-dependent manner (Hood et al., 2010; 
LeRoux et al., 2012; Russell et al., 2011). Despite the tremendous 
number and predicted diversity of T6 effectors, few activities 
have been ascribed to this important group of proteins. 

The majority of characterized T6 effectors act in the periplasm 
of target Gram-negative cells. Within this compartment, the pro- 
teins disrupt essential structures, such as cell-wall peptido- 
glycan (via amidase and glycoside hydrolase activity), and 
cellular membranes (via phospholipase and pore-forming activ- 
ity) (Russell et al., 201 4). Although a large number of cytotoxic T6 
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effectors have been identified, the mechanisms by which they in- 
fluence recipient cells are not well understood (Fritsch et al., 
2013; Hood et al., 2010; Whitney et al., 2014). Indeed, a group 
of related effectors that exhibit DNase activity are the only 
such proteins yet characterized (Ma et al., 2014). 

Unlike other proteinaceous toxins, such as the colicins, T6S 
effectors do not possess cell-entry mechanisms. Rather, they 
transit the TOSS, which breaches the outer membrane of recip- 
ient cells and thereby grants its substrates access to the cell inte- 
rior (Russell et al., 2011). Many components of the TOSS bear 
structural and functional relatedness to tail proteins of contractile 
bacteriophage (Silverman et al., 2012). The delivery of TO effec- 
tors into recipient cells has not been directly visualized; however, 
it is likely that they are propelled into recipient cells during 
phage-like contraction events of the apparatus (Basler et al., 
201 2). How TO effectors are recruited to the secretory apparatus 
is not completely understood. Evidence suggests at least two 
genetically distinct mechanisms operate. One subset of TO ef- 
fectors requires direct interaction with the interior of ring-shaped 
phage tail tube-like haemolysin co-regulated proteins (Hep) for 
export (Silverman et al., 2013). Hep proteins themselves are 
abundantly secreted in a TO-dependent manner, leading to the 
proposal that these toxins are delivered to recipient cells in com- 
plex with Hep. The relatively low molecular weight of Hep-asso- 
ciated effectors suggests that interaction with the pore of Hep 
places constraints on the size of toxins that can be delivered 
via this pathway. 

A second subset of effectors, including many that are high-mo- 
lecular-weight, multi-domain proteins, require specific valine- 
glycine repeat protein G (VgrG) type proteins for export (Hachani 
et al., 2014; Whitney et al., 2014). VgrG proteins form homotri- 
meric assemblies that have extensive structural homology with 
phage tail spike proteins, and, like Hep, are secreted in a TO- 
dependent manner. Also analogous to Hep, the requirement for 
VgrG proteins in effector export is thought to reflect a physical as- 
sociation of these proteins with cognate effectors. The biochem- 
ical basis for VgrG-effector interaction is not well studied; howev- 
er, modular adaptor domains— present either as domains within 
the effector protein or as independent polypeptides— appear to 
mediate binding. One such domain harbors PAAR repeat se- 
quences, which fold into a pyramidal structure that interacts 
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Figure 1. Tse6 Causes Stasis from the Cytoplasm of P. aeruginosa 

(A) Genomic context of vgrG1, tsi6, tse6, and eagT6 in P. aeruginosa PA01. 
Locus tag numbers are provided beiow each gene. The coior of each gene 
corresponds to the coior of its encoded protein shown in subsequent figures. 

(B) Domain organization of P. aeruginosa Tse6. The boundaries for the PAAR 
(residues 64-175) and toxin (residues 282-430) domains are indicated. Pre- 
dicted transmembrane domains are shown as dark gray rectangies. 

(C) intoxication of P. aeruginosa by Tse6 severeiy reduces growth. Data 
were derived from singie-ceil anaiysis of a parentai strain (ArefS AsspB 
pPSV38: :sspB) and a derivative depieted of Tsi6 (ArefS AsspB fs/6-D4 
pPSV38: :sspB). Bin size is 20 min and is normaiized to totai ceiis (parentai, n = 
15,042; fs/6-D4,n = 5,568). 

(D) Tsi6 depietion strains undergo Tse6-based toxicity independent of inter- 
ceiiuiar toxin deiivery by a functionai H1-T6SS. Patches of the indicated 
P. aeruginosa strains grown for 24 hr at 37°C under Tsi6 depietion-inducing 
(+iPTG) or non-inducing (-IPTG) conditions are shown. The parentai strain is 
the same as in (C). 

See aiso Movies SI and S2 and Tables S2 and S3. 



with the tip of the VgrG spike (Shneider et al., 2013). Despite 
recent advances in our understanding of the mechanisms under- 
lying T6S-dependent interbacterial interactions, the structure of a 



T6 effector in complex with a VgrG family protein has remained 
elusive. 

The genome of Pseudomonas aeruginosa encodes three 
T6SSs; each mediate antagonistic interactions with contacting 
Gram-negative bacterial cells (Hood et al., 2010; Jiang et al., 
2014; Russell et al., 201 3). The most extensively studied of these 
is the Hep secretion island l-encoded TOSS (HI -TOSS), which 
delivers at least six effectors to recipients. Prior work estab- 
lished that one of these, type VI secretion exported 0 (TseO), is 
a predicted transmembrane protein that contains a PAAR repeat 
domain, is exported in a VgrG-dependent manner, and is active 
in the cytoplasm of target cells (Whitney et al., 2014). Here, we 
demonstrate that TseO intoxicates by depleting cells of the 
related co-factors p-nicotinamide adenine dinucleotide (NAD"^) 
and NAD^ phosphate (NADP'^), thereby simultaneously inhibit- 
ing anabolic and catabolic processes required for homeostasis 
and growth. We make the surprising discovery that TseO re- 
quires interaction with translation elongation factor Tu for deliv- 
ery into recipient cells and define the structural and biochemical 
basis for interbacterial transfer of this membrane-associated 
toxin. 

RESULTS 

Tse6 Is a Bacteriostatic Toxin 

We previously found that TseO is an HI-TOSS-dependent anti- 
bacterial effector that requires vgrG1 for intercellular delivery 
(Whitney et al., 201 4). We further demonstrated that the toxic ac- 
tivity of TseO resides in its C terminus and can be neutralized by 
expression of a cognate immunity protein, TsiO (Figures 1 A and 
IB). Additionally, sequence and structural prediction algorithms 
identify a PAAR domain (TsoOpaar) flanked by transmembrane 
segments in the N terminus of the protein. 

The toxin domain of TseO does not bear homology to charac- 
terized proteins. The majority of studied antibacterial TOS effec- 
tors act on structures that are important for cellular integrity 
(Russell et al., 2014). Accordingly, intoxication by these effectors 
promotes morphological changes and cell lysis (LeRoux et al., 
2012). We examined P. aeruginosa cells undergoing TseO-based 
intoxication via depletion of TsiO. The HI -TOSS is quiescent in 
monoculture, thus we performed this and subsequent experi- 
ments in a background with activated expression of the system 
(ArefS) (LeRoux et al., 2015). Single-cell analyses showed TseO- 
intoxicated cells displayed a dramatic increase in division time, 
but generally maintained their structural integrity (Figure 1C; 
Movies SI and S2). The markedly slower growth of these cells 
was also apparent macroscopically; strains depleted of TsiO 
failed to form visible colonies after 24 hr of incubation (Figure 1 D). 
Depletion of TsiO from P. aeruginosa cells lacking HI -TOSS func- 
tion (MssM1), and thus the capacity to transport effectors inter- 
cellularly, yielded indistinguishable effects, indicating that the 
toxin domain of TseO accesses the cytoplasm of donor cells prior 
to export. 

Tse6 Resembles Mono-ADP-Ribosyltransf erase Toxins 

To gain further insight into TseO function, we determined the 
1 .4 A resolution crystal structure of its C-terminal toxin domain 
(residues 282-430, Tse0282-cj) in complex with TsiO (Figure 2A; 
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Table S1). Importantly, expression of Tse6282-CT alone induced 
stasis in Escherichia coii, recapitulating the phenotype of the 
full-length toxin in P. aeruginosa (Figure S1 A). Tse6282-cT adopts 
a mixed a/p fold comprised of two N-terminal a helices and a cen- 
tral core that is formed by two perpendicularly oriented p sheets 
(Figure 2A). A search of the PDB using DALI indicated that the 
closest structural homologs of Tse6282-cT are the catalytic do- 
mains of bacterial mono-ADP-ribosyltransferase (mART) toxins 
(Holm and Rosenstrom, 2010; Simon et al., 2014). Identified 
members of this family include diphtheria toxin (DT) from Coryne- 
bacterium diphtheriae (Z score, 4.7; root-mean-square devia- 
tion [RMSD] of 4.3 A over 87 equivalent positions) and Exotoxin A 
(ExoA) from P. aeruginosa (Z score, 3.0; RMSD of 2.8 A over 73 
equivalent positions). These and other characterized bacterial 
mART enzymes are secreted virulence factors that transfer the 
ADP-ribose moiety of NAD'^ onto eukaryotic proteins, typically 
leading to target protein inactivation, dramatic changes in cellular 
physiology, and, often, cell death (Simon et al., 2014). 

Despite a high degree of sequence divergence within mART 
proteins, they possess a structurally conserved p sheet core 
that harbors the molecular determinants for NAD^ binding (Field- 
house et al., 2010; Zhang et al., 2014). Structural alignment of 
Tse6282-cT with DT shows that peripheral secondary structure el- 
ements differ significantly, while the two p sheets that comprise 
the core overlay well (Figure 2B). Characterized mART enzymes 
are subdivided into two main groups based on the identities of 
amino acid residues at three positions involved in NAD'^ binding 
(Fieldhouse and Merrill, 2008). Members of the DT group use His, 
Tyr, and Glu, whereas Cholera toxin-type (CT) proteins retain Glu 
at position 3, but use Arg and Ser at positions 1 and 2, respec- 
tively. Strict conservation of the glutamate between the two 



Figure 2. The Toxin Domain of Tse6 Adopts 
a mART Fold and Harbors a Putative NAD*^ 
Binding Site 

(A) Overall structure of the Tse62S2-cT-Tsi6 com- 
plex. Tse6282-cT is shown in ribbon (left) and 
space-filling (right) representations. Secondary 
structure elements are labeled. Dots denote a 
disordered segment (amino acids 400-408) of 
Tse6282-cT that was not modeled. 

(B) Tse6282-cT resembles mART toxins. Structural 
alignment of Tse6282-cj with the catalytic domain 
of diphtheria toxin (PDB: 4AE1). Inset shows a 
structural alignment of the three conserved NAD"^ 
binding residues (circled numbers) of diphtheria 
toxin and Tse6. The numbers correspond to amino 
acid positions within Tse6. 

(C) Tsi6 interacts with the putative NAD^ binding 
pocket of Tse6. Structural alignment of free Tsi6 
and Tsi6 bound to Tse6282-cT- The structure of Tsi6 
does not change significantly upon complex for- 
mation (e.g., Glu63), except for Lys62, which ro- 
tates -^120° and interacts with Gln413 of Tse6. 
See also Figure S1 and Tables S1-S3. 



groups may reflect its role in stabilizing 
the oxocarbenium intermediate that 
forms upon nicotinamide dissociation 
from ADP-ribose during the catalytic cy- 
cle (Yates et al., 2006). Our structure indicates that Tse6 residues 
differ from those of both mART groups at each position involved 
in NAD^ binding, including the placement of a non-acidic residue 
at position 3 (Gln413) (Figure 2B). Nonetheless, the pocket lined 
by these residues is the principal site of Tsi6 binding, suggesting 
its importance for the toxic activity of Tse6. In total, our structural 
analyses suggest that Tse6282-cT is a mART fold enzyme with 
unique substrate binding and catalytic motifs. 

Tsi6 assumes an all a-helical fold that arranges into a four-helix 
bundle (Figure 2A). A search of the PDB indicates that Tsi6 
shares structural similarity with several proteins of unknown 
function including Nmul_A1745 from Nitrosospira muitiformis (Z 
score, 10.4; RMSD of 2.2 A over 86 equivalent positions) 
and PA2107 from P. aeruginosa (Z score, 9.5; RMSD of 
2.1 A over 82 equivalent positions). The Tse6282-cT-Tsi6 interac- 
tion involves extensive contacts between a3 of Tsi6 and the 
putative NAD^ binding pocket of Tse6282-cT- Interface analysis 
indicates that complex formation between Tse6282-cj and Tsi6 
buries 1 ,348 A^ of solvent-accessible surface area. Isothermal 
titration calorimetry (ITC) measurements yielded a dissociation 
constant of 31 nM for the complex (Figure SI B). 

To identify the conformational changes within Tsi6 required for 
inhibition of Tse6 activity, we determined the 1 .9 A crystal struc- 
ture of Tsi6 in isolation (Table SI). Overall, the structure of free 
Tsi6 does not differ significantly from that of Tsi6 in complex 
with Tse6282-cT (Ca RMSD of 0.4 A) (Figure SIC). This includes 
amino acid side chains of Tsi6 involved in the interaction with 
Tse6282-cT, with the notable exception of Lys62, which rotates 
approximately 120° around Cp to form a hydrogen bond with 
the putative NAD^ binding residue at position 3, Gln413 (Fig- 
ure 2C). Taken together, our structural data suggest that Tsi6 
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Figure 3. Tse6 Is an NAD(P)^ Glycohydro- 
lase Toxin 

(A) mART toxins possess open active sites that 
allow for docking of their protein targets. Co- 
crystal structure of P. aeruginosa ExoA and eu- 
karyotic elongation factor 2 (eEF2) from Jorgensen 
et al. (2005). The diphthamide moiety of eEF2 that 
is ADP-ribosylated by ExoA is shown in pink as a 
stick representation. 

(B) Structural superposition of Tse62S2-cT with 
ExoA predicts a steric clash with eEF2. The clash 
occurs through the conserved [K/R]STxxPxxDxx 
[S/T] motif of Tse6 (red). 

(C and D) Tse6282-cT exhibits NAD(P)^ glycohy- 
drolase activity. Rate of NAD'^ (C) and NADP"^ (D) 
consumption by purified Tse6282-cT in the pres- 
ence and absence of Tsi6. Each enzyme concen- 
tration was assayed in triplicate, and error bars 
represent ± SD. 

(E) Mass spectra of the products generated by 
Tse6-catalyzed breakdown of NAD"^. Peaks cor- 
responding to nicotinamide ([M-i-H]^, m/z = 123.1) 
and ADP-ribose ([M-H]“, m/z = 558.3) were 
identified in the reaction containing Tse6282-cT. 
whereas NAD"^ ([M-i-H]'^, m/z = 664.4) was identified 
in the reaction containing the Tse6282-cT-Tsi6 
complex. 

(F) NAD(P)'^ levels in E. coli cells expressing 
Tse6282-cT (-Tsi6) or co-expressing Tse6282-cT 
and Tsi6 (+Tsi6) relative to empty vector. Cellular 
NAD(P)^ levels were assayed 60 min after induc- 
tion of Tse6282-cT expression. 

(G) Relative NAD(P)^ levels in the indicated 
P. aeruginosa strains 45 min after induction of Tsi6 
degradation. Strains correspond to those used in 
Figure 1C. Error bars represent ± SD (n = 3). 

See also Figure S2 and Tables S2 and S3. 



inhibits the activity of Tse 6 through direct occlusion of its puta- 
tive NAD"^ binding site. 

Tse6 Exhibits NAD(P)'^ Glycohydrolase Activity 

The structure of Tse 6282 -cj innplies that the toxin may exert its ef- 
fects within recipient cells via mono-ADP-ribosylation of an un- 
known bacterial protein. An important feature of characterized 
mART enzymes that facilitates their transferase activity is an 
open active site that allows docking of the acceptor protein. 
This concept is exemplified by the co-crystal structure of 
P. aeruginosa ExoA in complex with eukaryotic elongation factor 
2 (Figure 3A) (Jorgensen et al., 2005). However, structural super- 
position of Tse 6282 -cT with ExoA predicts a steric clash between 
Tse 6282 -cT and a proteinaceous ADP-ribose acceptor (Fig- 
ure 3B). Interestingly, the structural element of Tse 6282 -cj that 
prohibits accommodation of a high-molecular-weight acceptor 
is comprised of a motif conserved among Tse 6 orthologs ([K/ 
R]STxxPxxDxx[S/T]), implying that this region is important for 
Tse 6 function (Zhang et al., 201 2). Consistent with these data, in- 
cubation of purified Tse 6 with P. aeruginosa or E. coii cell lysates 
containing ^^P-NAD^ did not lead to observable transfer of ^^P- 
ADP-ribose to a protein target (data not shown). 



Given the limited accessibility of the Tse 6 active site, we hy- 
pothesized that the protein might instead function as an NAD^ 
glycohydrolase. Although it is less common within the mART su- 
perfamily of enzymes, NAD'^ glycohydrolase activity has been 
observed for the SPN toxin of Streptococcus pyogenes (Ghosh 
et al., 2010). To test our hypothesis that Tse 6 is an NAD"^ glyco- 
hydrolase enzyme, we performed kinetic analyses of NAD^ con- 
sumption by Tse 6282 -cT- Whereas mART enzymes exhibit only 
low levels of NAD^ hydrolysis (<1 0 min“^), we found that purified 
Tse 6282 -cT catalyzes NAD"^ breakdown at a rate of approximately 
1 .2 X 1 0^ min“'' (Figure 3C) (Ghosh et al., 201 0). This activity was 
reduced to background by the addition of 1 .5 molar equivalents 
of Tsi 6 to the reaction mixture, suggesting NAD"^ degradation is a 
physiologically relevant activity of the toxin. Given the structural 
similarity between NAD"^ and its phosphorylated derivative 
NADP"^, we also tested the ability of Tse 6282 -cj to consume 
NADP"^. Breakdown of this dinucleotide occurred at a compara- 
ble rate (6.0 x 10^ min“^), suggesting that Tse 6 degrades both 
NAD-^ and NADP-^ (NAD(P)^ (Figure 3D). 

Rather than hydrolytically cleaving their substrates, some 
NAD'^-degrading enzymes generate a cyclic product that is a 
characterized signaling molecule in eukaryotes (Guse, 2000). 
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The fluorescence assay we employed does not distinguish be- 
tween cyclized and non-cyclized forms of ADP-ribose, thus we 
used mass spectrometry (MS) to analyze the reaction products 
of Tse 6 and NAD"^. Nicotinamide and ADP-ribose were the only 
detectable products, defining Tse 6 as an NAD(P)'^-glycohydro- 
lase enzyme (Figure 3E). 

Tse6 Induces Bacteriostasis by Depleting Cellular 
NAD(P)^ Levels 

Our biochemical data show that Tse 6 rapidly hydrolyzes NAD(P)^ 
in vitro; however, it is possible that we observe this activity due to 
the absence of an appropriate ADP-ribose acceptor molecule. 
To address this possibility, we expressed Tse 6282 -cT in E. coli 
and measured endogenous NAD(P)'^ levels. Upon induction of 
Tse 6282 -cT expression, we found that E. coli cells contained 
vastly reduced cellular concentrations of NAD^ and NADP"^ rela- 
tive to the vector control (Figure 3F). Co-expression with Tsi 6 
restored NAD(P)'^, indicating that the loss of the dinucleotides 
is a direct consequence of Tse 6282 -cT activity. 

Next, we sought to measure the influence of endogenous Tse 6 
on NAD(P)'^ levels in intoxicated P. aeruginosa cells. In agree- 
ment with our findings in E. coli, intracellular intoxication caused 
by depletion of Tsi 6 led to a profound decrease in NAD(P)'" (Fig- 
ure 3G). The precise measurement of Tse 6 -catalyzed NAD(P)^ 
depletion during intercellular intoxication is complicated by 
high background levels of the dinucleotides derived from donor 
cells, which as intoxication of recipient cells proceeds, constitute 
an increasingly large proportion of the total cellular population. 
To partially overcome this, we examined a time point at which 
recipient cells have begun to experience intoxication, but are 
not yet depleted from the population. Comparing total NAD"^ 
levels in conjunction with donor and recipient colony-forming 
units (CFU) to a reference experiment, we confirmed a significant 
reduction in NAD^ within recipient cells (Figure S2). Based on 
these findings, we propose that the toxicity elicited by Tse 6 is 
due to depletion of cellular NAD(P)^ pools. 

Tse6 Participates in a Five-Protein Complex Containing 
Elongation Factor Tu 

Bioinformatic analyses predict that Tse 6 is a PAAR domain-con- 
taining integral membrane protein (Figure 1B). To test whether 
Tse 6 resides in membranes, we generated a P. aeruginosa 
strain producing a functional fusion of Tse 6 to the vesicular 
stomatitis virus glycoprotein epitope from the native tse6 locus 
(tse6-V) (Figure S3A). Western blot analysis of the soluble and 
membrane fractions of this strain revealed that despite high- 
confidence prediction of transmembrane domains within Tse 6 , 
the majority of the protein is soluble (Figure 4A). Based on 
structural studies of PAAR domains in complex with VgrG-like 
chimeras, it has been speculated that effectors containing 
this domain interact with VgrG proteins (Shneider et al., 2013). 
We hypothesized that Tse 6 could be solubilized within donor 
cells by virtue of association with VgrGI via its PAAR domain. 
Indeed, in the absence of VgrGI , we observed significant repar- 
titioning of Tse 6 to the membrane fraction of cells. Tse 6 
remained soluble in a strain lacking tssM1, indicating that 
the localization of the toxin is not generally sensitive to T 6 
function. 



Motivated by the finding that Tse 6 is a soluble protein in the 
presence of VgrGI, we used co-immunoprecipitation to probe 
for a physical interaction between the proteins. Surprisingly, 
this led to the identification of a putative complex containing 
Tse 6 , Tsi 6 , VgrGI, PA0094, and translation elongation factor 
Tu (EF-Tu) (Figure 4B). PA0094 is a member of a recently 
described group of effector-specific accessory factors that facil- 
itate delivery of their cognate effectors (Alcoforado and Coulth- 
urst, 2015). Henceforth, we refer to PA0094 as effector-associ- 
ated gene with tse6 (EagT 6 ). 

The identification of EF-Tu in a complex containing Tse 6 was 
unexpected. This conserved bacterial protein is a GTPase that 
delivers newly charged aminoacyl-tRNA molecules to the ribo- 
some during translation elongation (Voorhees and Ramak- 
rishnan, 2013). Interactions between T 6 effectors and essential 
bacterial proteins have not been described; therefore, we 
decided to probe the functional significance of this observation. 
To test whether Tse 6 and EF-Tu interact directly, we initiated ex- 
periments to evaluate P. aeruginosa EF-Tu (EF-Tu^'^) binding to 
the soluble region of Tse 6 (Tse 6222 -cT) in vitro. Interestingly, dur- 
ing the course of this work, we noted Tse 6222 -cT associates with 
E. coli EF-Tu (EF-Tu^°), which is 88 % identicai to EF-Tu^^. Puri- 
fication of N-terminal Tse 6 truncations narrowed the region 
responsible for EF-Tu binding to the last 165 amino acids of 
the toxin (Figure 4C). Since EF-Tu does not bind the glycohydro- 
lase domain of Tse 6 (residues 282-CT), we reasoned that the 
interaction with EF-Tu requires amino acids 265-282 of the toxin. 
Despite considerable sequence divergence from P. aeruginosa 
Tse 6 , orthologs of the toxin from P. putida and P. syringae also 
co-purified with EF-Tu^*^ (Figures S3B and S3C). 

Next, we measured the binding affinity of Tse 6265 -cj to EF- 
Tu^^ and EF-Tu^^ using ITC. Guanosine triphosphate (GTP) hy- 
drolysis by EF-Tu is coupled to significant structural changes in 
the protein; therefore, we investigated both GTP- and GDP- (EF- 
Tu»GDP) bound conformations of the molecule (Clark and Ny- 
borg, 1997). In line with our purification results, we found that 
Tse 6265 -cT interacts tightly with both EF-Tu^^«GDP (Kd = 
81 nM) and EF-Tu^^«GDP (Kd = 23 nM) (Figures S3D and S3E). 
Application of the antibiotic Aurodox, which locks EF-Tu into 
its GTP-bound conformation, reduced the affinity for Tse 6 by 
approximately 1 0-fold (Figure S3F)(Vogeley etal., 2001). In sum- 
mary, these data indicate that Tse 6 binds directly to the GDP 
form of EF-Tu within a larger, multiprotein complex. 

Structure of the Tse6-EF-Tu Complex 

Though all cells require NAD(P)'^, the process of translation 
does not rely on these co-factors. Thus, the significance of 
Tse 6 interaction with EF-Tu was not apparent. As a first step 
toward defining the relevance of the Tse 6 -EF-Tu complex, we 
determined the 3.5 A crystal structure of Tse 6265 -cT bound to 
EF-Tu'"^.GDP (Figure 4D; Table S1). 

Overall, the structure of Tse 6265 -cT is highly similar to Tse 6282 -cT 
(Ca RMSD of 0.7 A). The most striking divergence between the 
two structures is the ordering and 60° hinge-like movement of 
the [K/R]STxxPxxDxx[S/T] motif-containing loop, henceforth 
referred to as the Tse 6 activation loop. This results in a ~15-A 
displacement of Asp396 that directs its side chain into the putative 
NAD(P)^ binding site (Figure 4E). Asp396 is the sole invariant 
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Figure 4. Tse6 Participates in a Multi-protein Complex and Binds Helix D of EF-Tu through Residues N-Terminal to Its Toxin Domain 

(A) Tse6 is a membrane protein that is soiubiiized by VgrGI . Western biot anaiysis of the soiubie (S) and membrane (M) fractions of the indicated P. aeruginosa 
strains. Tsel and OprF serve as soiubie and membrane controis, respectiveiy. 

(legend continued on next page) 
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acidic residue among Tse 6 orthologous proteins, leading us 
to postulate that it serves a role analogous to the conserved 
glutamic acid at position 3 within the DT and CT mART families 
(Figure S4) (Zhang et al., 2012). Consistent with this hypothesis, 
purified Tse 6282 -cT°^^®^ displayed approximately 225-fold 
reduced NAD(P)‘^ glycohydrolase activity relative to the wild- 
type protein, and a P. aeruginosa strain producing Tse 6 °^®®"^ 
from the native tse6 locus did not exhibit Tse 6 -based intercellular 
intoxication (Figures 4F and 4G). 

In accordance with our Tse 6 truncation studies, interaction 
with EF-Tu^'^ is mediated by residues immediately N-terminal 
to the toxin domain (residues 265-291). This basic segment 
forms two a helices (aO and al) that engage in numerous salt 
bridges with acidic side chains on the GTPase domain (G 
domain) of EF-Tu'^'^. Interestingly, the EF-Tu^'^ residues involved 
in this interaction are found on helix D, which functions as the key 
interaction site for both the guanine exchange factor EF-Ts and 
the ribosome (Figure 4H) (Kawashima et al., 1996). 

Interaction with EF-Tu Is Required for the Delivery of 
Tse6 into Recipient Cells 

Our structure of Tse 6265 -cT~EF-Tu^'^ shows that a spatially 
confined cluster of electrostatic interactions facilitates binding 
of the proteins (Figures 5A and 5B). We reasoned that this inter- 
action mechanism might afford an opportunity to dissect the 
functional significance of the interaction via site-directed muta- 
genesis. A non-conserved leucine residue (Leu270) was identi- 
fied within the patch of basic amino acids on aO that mediate 
EF-Tu binding (Figure 5C). We postulated that an acidic residue 
substituted at this position would disrupt charge complemen- 
tarity between the proteins. As predicted, Tse 6222 -cT'“^^°^ did 
not co-purify with EF-Tu^^, whereas variants containing a 
more conservative substitution at this site (L270A) or an analo- 
gous substitution on the opposite face of aO (A268E) retained 
EF-Tu^^ binding (Figure 5D). 

Encouraged by our in vitro data, we next generated a 
P. aeruginosa strain expressing Tse 6 '“^^°^-V from the native 
tse6 locus. An immunoprecipitation and growth competition 
experiment utilizing this strain showed that Tse 6 '“^^°^-V displays 
a specific defect in EF-Tu interaction and is unable to intoxicate 
recipient cells (Figures 5E and 5F). We conclude that association 
with EF-Tu is essential for Tse 6 -based toxicity. 

Tse 6 -based intercellular intoxication can be viewed as a num- 
ber of discrete processes. We considered the involvement in and 



requirement for EF-Tu in (1) the stability of Tse 6 , (2) the enzy- 
matic activity of Tse 6 , (3) Tse 6 export from donor cells, and (4) 
entry of Tse 6 into recipient cells. Since Tse 6 '“^^°^ is present at 
equal concentrations as the wild-type protein, we ruled out a 
requirement for EF-Tu in Tse 6 stability. Our biochemical data 
show that the catalytic domain of Tse 6 degrades NAD^ rapidly, 
at a rate consistent with a known cytotoxic NAD^ glycohydrolase 
enzyme (Ghosh et al., 2010). Therefore, one possibility is that 
residues N-terminal to the toxin domain are auto-inhibitory and 
that EF-Tu binding to this region relieves this inhibition. Indeed, 
the activation loop of Tse 6 differs significantly in position be- 
tween the Tse 6282 -cT-Tsi 6 and Tse 6265 -cT-EF-Tu^^ structures, 
suggesting that either EF-Tu induces a conformational change 
in the toxin or immunity protein binding excludes this loop from 
the active site (Figure 4E). We found that a purified Tse 6 variant 
that includes the EF-Tu binding region (Tse 6222 -cj) catalyzes 
NAD'^ hydrolysis at a rate indistinguishable to the toxin domain 
alone (Figure 5G). Furthermore, the activity of this protein was 
unaffected by the addition of excess EF-Tu'^'^. 

Next, we considered the possibility that association of Tse 6 
with EF-Tu is required for export of the toxin from donor cells. 
However, we found that both cellular and extracellular levels of 
Tse 6 '“^^°^-V are similar to the wild-type protein (Figure S5). 
These experiments further showed that unlike the Hcp-associ- 
ated effector Tsel , Tse 6 accumulation in the exo-proteome of 
P. aeruginosa is only partially dependent on H1-T6SS function. 
The significance of this is not yet understood; however, strains 
lacking the T 6 S ATPase CIpVI or the core integral membrane 
protein TssMI yielded similar results. 

Since interaction with EF-Tu is dispensable for Tse 6 catalytic 
activity and export, we deduced that interaction with the transla- 
tion factor must be required for Tse 6 to reach the cytoplasm of 
recipient cells. In further support of this contention, we found 
Tse 6 '“^^°^-V is as active in intracellular intoxication triggered by 
Tsi 6 depletion as the wild-type protein (Figure 5H). Together 
with our findings that Tse 6 '“^^°^-V is incapable of Tse 6 -based 
intercellular intoxication despite its unencumbered transit of 
the T 6 SS, these data indicate that interaction with EF-Tu grants 
Tse 6 access to the cytoplasm of recipient cells. 

Ultrastructure of a T6 Effector-VgrG Complex 

VgrG is thought to serve as the T 6 S protein that pierces the outer 
membrane of recipient cells, granting its bound cognate effec- 
tor(s) access to the periplasm of target cells (Silverman et al.. 



(B) Silver-stained SDS-PAGE analysis of proteins enriched by anti-VSV-G immunoprecipitation from P. aeruginosa strains encoding Tse6 (parental) and Tse6-V. 
The labels indicate the identities of proteins that specifically co-precipitate with Tse6-V as determined by MS. In addition to their monomeric forms, VgrG1 and 
Tse6 form a high-molecular-weight complex that is resistant to heat and SDS denaturation. 

(C) A 1 7-amino-acid segment of Tse6 mediates interaction with EF-Tu. Coomassie-stained SDS-PAGE analysis of purified Tse6 truncations. All truncations were 
expressed with Tsi6 and assessed for co-purification with endogenous EF-Tu^^. 

(D) Overall structure of the Tse6265-cT“EF-Tu^'^ complex. Secondary structure elements involved in the interaction are labeled. 

(E) The Tse6 activation loop harbors Asp396 and rotates toward the active site of Tse6 in the Tse6265-cT-EF-Tu'^'^ structure relative to its position in the 
Tse6282-cT-Tsi6 structure. Dots denote a disordered segment (amino acids 400-408) of Tse6282-cT that was not modeled. 

(F) Tse6282-CT D396A exhibits significantly reduced NAD(P)^ glycohydrolase activity. Rate of NAD^ (left) and NADP^ (right) consumption by purified Tse6282-cT 
D396A. 

(G) Asp396 is critical for Tse6-based intercellular toxicity. Growth competition experiments between the indicated P. aeruginosa donor and recipient strains. 
Donor and recipient strains were mixed 1:1, grown for 24 hr on solid media, and differentiated using blue/white screening. 

(H) Helix D of EF-Tu is the site of interaction for both Tse6265-cT (left) and the guanine exchange factor EF-Ts (right). In all panels, error bars represent ± SD (n = 3). 
See also Figures S3 and S4and Tables SI -S3. 
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Figure 5. Interaction with EF-Tu Is Required for Tse6-Based Intercellular Toxicity 

(A and B) An electrostatic patch mediates interaction between EF-Tu and Tse6265-cr- Electrostatic surface representation of EF-Tu'^'^ (A) and Tse6265-cT (B). 
Residues participating in the interaction are labeled and outlined in black. 

(C) Close-up view of the Tse6265-cT-EF-Tu^'^ interaction. Secondary structure elements referred to in the text are labeled. 

(D and E) An L270E variant of Tse6 does not interact with EF-Tu^*^ or EF-Tu^"^. (D) Coomassie-stained SDS-PAGE analysis of purified Tse6265-cT variants. All 
variants were expressed with Tsi6 and assessed for their ability to co-purify with endogenous EF-T u^*^. (E) Silver-stained SDS-PAGE analysis of proteins enriched 
by anti-VSV-G immunoprecipitation from P. aeruginosa strains encoding Tse6-V (parental) and Tse6-V‘-^^°^. Enriched low-molecular-weight proteins (EagT6 and 
Tsi6) not shown. 

(F) Tse6 requires interaction with EF-Tu to intoxicate recipient cells. Outcome of growth competition experiments between the indicated P. aeruginosa donor 
strains and a parental {AretS) or Tse6-susceptible {Atse6 Atsi6) recipient. The competitive index is calculated as the change (final/initial) in ratio of donor to 
recipient CFU. 

(G) Interaction with EF-Tu'^'^does not enhance NAD^ glycohydrolase activity of Tse6222-cT- Reactions were performed using 500 pM Tse6222-cT in the presence or 
absence of 1 i^M EF-Tu'^^ 

(H) Interaction with EF-Tu is not required for Tse6-based intracellular intoxication. NAD^ levels in the indicated P. aeruginosa strains 45 min after induction ofTsiO 
degradation (top). Patches of the indicated P. aeruginosa strains grown for 24 hr at 37°C under Tsi6 depletion-inducing (+IPTG) conditions (bottom). The parental 
strain is the same as in Figure 1C. Error bars represent + SD (n = 3). 

See also Figure S5and Tables S2 and S3. 
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Figure 6. Two Conformations of the Tse6 Secretory Particle Revealed by Electron Microscopy 

(A and B) Addition of detergent dissociates EagT6 from the Tse6 secretion particie and causes a conformationai change. (A) Coomassie-stained SDS-PAGE 
anaiysis and (B) representative ciass averages of purified Tse6-containing compiex in the presence and absence of 0.03% p-D-dodecyimaitopyranoside. 

(C) 3D density map and moiecuiar fitting of theTse6-Tsi6-VgrG1-EagT6-EF-Tu'^'^ compiex. The identity of each subunit is indicated. The modei forTseOpAAR was 
generated using Phyre (Keiley and Sternberg, 2009). 

(D) Tse6 requires EagT6 for intraceiiuiar accumuiation. Western biot anaiysis of Tse6 ieveis in the indicated P. aeruginosa strains. RNA poiymerase (RNAP) is used 
as a ioading controi. 

(E) 3D density map and moiecuiar fitting of the detergent-bound TseO-TsiO-VgrGI-EF-Tu^"^ compiex. Scaie bars, 20 nm. 

See aiso Figures S6 and S7and Tabies S2 and S3. 



2012). The PAAR domain of effectors associates with the tip of 
VgrG; however, the placement of additional effector domains, 
as well as accessory proteins, in this particle is not known 
(Shneider et al., 2013). To gain insight into the topology of an 
effector-loaded VgrG complex, we examined purified C-termi- 
nally octa-histidine-tagged Tse6 in complex with Tsi6, VgrGI, 
EagT6, and EF-Tu^^ using negative-stain electron microscopy 
(EM) (Figures 6A and 6B, left). 



Analysis of 12,000 single particles permitted the calculation of 
a 3D map of the complex resolved to 22 A (Figures 6C and S6A- 
S6D). VgrG proteins have a characteristic structure that was 
readily apparent within the map (Shneider et al., 2013). Fortu- 
itously, the unpublished X-ray crystal structure of P. aeruginosa 
VgrGI is available in the PDB (PDB: 4MTK); the location of this 
trimeric assembly in our structure was unambiguous. For esti- 
mating the placement of Tse6, Tsl6, and EF-Tu, we were able 
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to utilize our assorted high-resolution structures of these pro- 
teins to produce a ternary complex that largely conformed to re- 
gions of density found near the end of the complex predicted to 
initiate contact with recipient cells. An additional constraint on 
the location of Tse6 is its PAAR domain, which could be modeled 
with high confidence bound to the tip of VgrG1 (Figure 6C). The 
91 residues connecting Tse6pAAR to the first residue included in 
our modeled ternary complex (residue 265) are predicted to form 
a transmembrane helix, followed by a disordered glycine-rich 
span (not modeled). Ni-NTA-nanogold labeling of the C-terminal 
HiSs-tag of Tse6 provided support for our placement of this 
linchpin protein within the complex (Figure S6E). 

The positions of VgrG, Tse6, Tsi6, and EF-Tu^'^ left a protrud- 
ing region of density surrounding the PAAR domain of Tse6 
unoccupied. We postulated that this region of the map corre- 
sponds to EagT6. The structure of EagT6 was determined in a 
high-throughput X-ray crystallography project and made pub- 
licly available. The domain-swapped homodimeric protein bears 
a distinctive horseshoe configuration that we placed in this unoc- 
cupied density (Figures 6C and S7A). In this configuration, EagT6 
would be predicted to bind TsoSpaar and its buttressing hydro- 
phobic segments. To test this prediction, we performed co-puri- 
fication experiments of the proteins in E. coli. In contrast to the 
complex isolated with full-length toxin, Tse6222-cr did not co-pu- 
rify with EagT6 (Figure S7B). Although the X-ray crystal structure 
of EagT6 does not fully agree with the calculated map in this re- 
gion, these data provide biochemical support for our placement 
of EagT6 in proximity to the N-terminal domains of Tse6. 

To garner additional insight into EagT6 function, we probed 
the capacity of P. aeruginosa ^eagT6 to intoxicate Tse6-sensi- 
tive recipient cells. This strain failed to elicit Tse6-based toxicity, 
but retained the capacity to intoxicate recipients using another 
FI1-T6S effector (Figure S7C). EagT6 associates with a region 
of Tse6 rich in transmembrane domains, suggesting that the 
accessory factor could act as a chaperone for the toxin. As pre- 
dicted for substrate-chaperone systems, we found that accumu- 
lation of Tse6 is markedly diminished by the absence of EagT6 
(Figure 6D). Genetic complementation of this phenotype was 
achieved with ectopic expression of eagT6. Taken together 
with the findings of Alcoforado and Coulthurst (2015) pertaining 
to a EagT6-related protein in Serratia, we propose that EagT6 
functions as a Tse6-specific chaperone. 

Tse6 is a transmembrane protein; thus, its transport between 
cells likely requires shielding of its hydrophobic domains. Based 
on its orientation and position relative to Tse6 in our model, 
we posited that EagT6 might chaperone Tse6 by shielding its hy- 
drophobic segments from aqueous mediums during intercellular 
transport. In support of this hypothesis, EagT6 dissociates from 
the effector complex in the presence of detergent (Figure 6A, 
right). To gain structural insight into the consequence of EagT6 
release, we examined the Tse6 particle depleted of this protein 
by negative-stain EM and single-particle reconstruction (Fig- 
ure 6B, right). Analysis of 1 1 ,000 particles permitted the calcula- 
tion of a 3D map of the complex resolved to 1 9 A (Figures 6E and 
S6F-S6J). Remarkably, we observed a ~40 A movement of the 
EF-Tu-Tse6265-cT-Tsi6 sub-complex from VgrG-Tse6pAAR- The 
localization of Tse6265-cT within the displaced density was veri- 
fied by Ni-NTA-nanogold labeling (Figure S6K). Accompanying 



this reorganization, we observed a region of unoccupied density 
in proximity to the predicted site of the hydrophobic segments of 
Tse6. Given the capacity of detergent to compete for EagT6 
binding to the complex, we hypothesize that ordered detergent 
molecules bound to the hydrophobic domains of Tse6 occupy 
this density. 

The significance of Tsi6 and EF-Tu within the effector complex 
is not understood. The toxicity conferred by depletion of Tsi6 from 
donor cells shows that the catalytic domain of Tse6 is present in 
the cytoplasm and is in complex with Tsi6 prior to export by the 
HI -T6SS. Therefore, EF-Tu also likely interacts with the toxin prior 
to export. If these proteins do not dissociate from the toxin during 
transit, the structures we obtained would represent the secreted 
complex. Alternatively, Tsi6 and EF-Tu could be removed during 
secretion and re-engage the complex in the cytoplasm of recipient 
cells. In this scenario, the complex we isolated would represent 
that found in recipient cells with immunity. In total, our ultrastruc- 
tural analyses of the Tse6 secretory particle define the architec- 
ture of an effector-loaded VgrG and suggest a mechanism for 
deployment of a membrane-associated toxin (Figure 7). 

DISCUSSION 

We have discovered that Tse6 intoxicates recipient cells by cata- 
lyzing the hydrolytic removal of the nicotinamide moiety from 
NAD"^ and NADP"^. This mechanism has not been described for 
an interbacterial toxin, but it is consistent with the general obser- 
vation that T6 effectors act on target molecules that are both 
essential and highly conserved among bacteria (Russell et al., 
2014). The consequence of NAD(P)'" degradation by Tse6 is sta- 
sis in most cells, rather than death. The relative benefit(s) of in- 
hibiting the growth of target cells is not yet understood; however, 
the HI -T6SS Tse2 toxin also induces stasis in recipients (Li et al., 
201 2). In instances of self-intoxication, the exchange of bacterio- 
static toxins could promote the formation of persister cells. A 
non-mutually exclusive possibility is that when delivered within 
an effector cocktail, bacteriostatic and bacteriocidal toxins act 
synergistically. 

The requirement for NAD(P)"^ extends to all forms of life, raising 
the possibility that Tse6, and related proteins exported by the 
T6SS, could intoxicate archaeal and eukaryotic cells. The SPN 
toxin of S. pyogenes provides precedent for the action of a bacte- 
rial NAD'^ glycohydrolase against a eukaryotic target, although this 
is a structurally distinct toxin that utilizes pores introduced by 
Streptolysin O to gain entry into host cells (Madden et al., 2001 ; 
Smith et al., 201 1). The H3-T6SS of P. aeruginosa has been shown 
to deliver a phospholipase D toxin to both bacterial and eukaryotic 
cells, implying that there is not a fundamental barrier to inter- 
domain targeting of effectors by this bacterium (Jiang et al., 201 4). 

To our knowledge, the requirement for a housekeeping protein 
in the function of a T6S effector has not previously been 
observed. Likely owing to the central role of EF-Tu in translation, 
tufA, which encodes EF-Tu, is a slowly evolving bacterial gene 
(Lathe and Bork, 2001). Thus, if the role of the Tse6 interaction 
with a cellular housekeeping protein is to grant the toxin access 
to recipient cells as our data suggest, EF-Tu would allow the 
toxin to target phylogenetically diverse bacteria. The high con- 
centration of EF-Tu within cells could also contribute to a wide 
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Figure 7. Proposed Model for Tse6 Transport by the T6S Apparatus 

The configuration of the Tse6 particle subunits in donor and recipient cells 
represent those determined in this study in the absence and presence of 
detergent, respectively. A representative EM class average for each is pro- 
vided for reference. In the donor cell, Tse6 and associated proteins are bound 
to the T6S sheath complex (gray; PDB: 3J90) (Clemens et al., 2015). The T6 
trans-envelope complex (composed of TssL, M, and J) is schematized to 
approximate its recently determined EM structure (Durand et al., 2015), 
whereas the T6 baseplate-like assembly is depicted in filled gray. In the model, 
donor cell EF-Tu (light blue) and Tsi6 disengage from the Tse6 secretion 
particle upon export from the donor cell. Upon crossing the outer membrane 
(OM) of the recipient cell, EagT6 dissociation frees the hydrophobic domains of 
Tse6 (yellow rectangles) for incorporation into the recipient inner membrane 
(IM). Recipient cell EF-Tu (dark blue) facilitates transfer of the NAD(P)'^ gly- 
cohydrolase domain into the recipient cell cytoplasm by an unknown mech- 
anism. Several other possibilities consistent with all available data are not 
presented. Most notably, donor-cell-derived EF-Tu may be exported as part of 
the secretion particle and facilitate Tse6 delivery into recipient cells, or donor- 
cell-derived EF-Tu may be excluded from the secretion complex. 

See also Tables S2 and S3 



target range for Tse6, as there would be more tolerance for 
weakened association between the two proteins driven by EF- 
Tu sequence divergence. 

Although the site of Tse6 binding to EF-Tu would preclude 
binding of the translation factor to EF-Ts, it is unlikely that Tse6 
affects translation (Kawashima et al., 1996). We find Tse6 pre- 
sent at low levels in P. aeruginosa] therefore, the yet lower levels 



within recipient cells would not sequester a functionally signifi- 
cant portion of EF-Tu. Indeed, there is growing evidence that 
the large pool of EF-Tu is exploited for multiple purposes within 
bacteria, including P. aeruginosa (Balasubramanian et al., 2008; 
Barel et al., 2008; Defeu Soufo et al., 2010; Kunert et al., 2007; 
Mohan et al., 2014). Barbier et al. (2013) have found that EF- 
Tu^'^ is posttranslationally modified by trimethylation at Lys5. 
These authors also found that this form of the protein localizes 
to the cell surface, where it mediates interactions with airway 
epithelial cells. Whether EF-Tu is actively secreted to the cell sur- 
face or if its presence there is a consequence of cell lysis was not 
determined. It is worth noting that the high concentration of EF- 
Tu present in culture supernatants through T6-independent 
mechanisms precluded measurement of the contribution of T6 
to EF-Tu export in our study. 

Given the changing chemical and physical environments that 
necessarily accompany translocation across multiple mem- 
branes, it is without doubt that effectors delivered intercellularly 
assume multiple states en route. We have captured just two of 
these for a T6S effector. Tse6 is the hub of a multi-protein com- 
plex in our structures; however, our biochemical data show that 
it need not interact with any of these proteins in order to catalyze 
NAD(P)'^ degradation. This leaves many open questions, in- 
cluding how does EF-Tu facilitate Tse6 translocation into recip- 
ient cells? From our current data, we cannot determine whether 
EF-Tu derived from donor cells, recipient cells, or both is critical 
for Tse6 activity. One appealing model consistent with our data 
holds that Tse6 is delivered to the target cell periplasm, where- 
upon EagT6 is released and the exposed transmembrane seg- 
ments of the toxin spontaneously insert into the inner membrane. 
At this point, translocation of residues N-terminal to the toxin 
domain and ensuing EF-Tu-binding could serve as a molecular 
ratchet that favors passage of the remaining toxin domain into 
the cytoplasm. Interestingly, the EF-Tu binding domain of Tse6 
is rich in basic residues, a property of many known cell-pene- 
trating peptides (Bechara and Sagan, 2013). 

While this study provides two snapshots of interbacterial pro- 
tein transport, it also highlights the challenges in understanding 
this intricate, multi-step process. The Tse6 particle we describe 
may provide a tractable system for the characterization of addi- 
tional secretory intermediates. A complete understanding of 
toxin entry into recipient cells could define novel routes for the 
delivery of antimicrobials. 

EXPERIMENTAL PROCEDURES 

Bacterial Strains, Piasmids, and Growth Conditions 

All P. aeruginosa strains generated were derived from the sequenced strain 
PA01 (Stover et al., 2000). P. aeruginosa mutants and chromosomal fusions 
were generated by allelic exchange as described previously (Hood et al., 
2010). E. co// strains DH5a, BL21(DE3) pLysS, and SMI 0 were used for plasmid 
maintenance, gene expression, and conjugative transfer, respectively. A detailed 
list of strains and plasmids used in this study can be found in Tables S2 and S3. 

Crystaiiization and Structure Determination 

Details for the crystallization of Tse6282-cT-Tsi6, Tsi6, and Tse6265-cT-EF-Tu are 
described in the Supplemental Experimental Procedures. The structures of 
Tse6282-cT-Tsi6 and Tsi6 were solved by Se-SAD. The Tse6265-cT- 
EF-Tu structure was solved by molecular replacement using EF-Tu^‘^»GDP 
(PDB: 1EFC) as a search model. Details for structure determination and model 
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refinement are described in the Supplemental Experimental Procedures 
(Table S1). 

Biochemical Assays 

Hydrolysis rates of NAD(P)'^ by Tse6 were measured using a fluorescence 
endpoint assay as described previously (Johnson and Morrison, 1970). Deter- 
mination of relative NAD^ and NADP^ levels from cell lysates was performed 
using the NAD/NADH-Glo and NADP/NADPH-Glo bioluminescence assays, 
respectively, as per the instructions of the manufacturer (Promega). Details 
can be found in the Supplemental Experimental Procedures. 

Bacterial Competition Assays 

Intraspecific competition assays between P. aeruginosa strains were per- 
formed as previously described (Whitney et al., 2014). Briefly, overnight cul- 
tures of P. aeruginosa strains were mixed in a 1:1 ratio and spotted onto 

0. 2-|im nitrocellulose membranes overlaid on a 3% agar Luria broth no-salt 
plate. Competitive indices were calculated by enumerating donor/recipient 
CPU after 24 hr of growth at 37°C. All competitive indices were adjusted by 
the donor/recipient ratio in the initial inoculum. Recipient strains express the 
iacZ gene from a neutral phage attachment site to enable their differentiation 
from unlabeled donor via blue/white screening. 

Electron Microscopy and Image Analysis 

Protein samples were negatively stained with uranyl formate (SPI Supplies/ 
Structure Probe) and imaged using a JEOL1400 microscope equipped with 
a LaBe cathode operated at 120 kV. Images were recorded at a magnification 
of 50,000x on a 4k X 4k CMOS camera F416 (TVIPS). Data analysis and 
further processing was done in SPARX (Hohn et al., 2007). Details can be found 
in the Supplemental Experimental Procedures. 

ACCESSION NUMBERS 

The accession numbers for the atomic coordinates of Tse6282-cT— TsiO, Tsi6, 
and Tse6265-cT— EF-Tu*^^ are PDB: 4ZV0, 4ZUY, and 4ZV4, respectively. The 
accession numbers for the negative-stain EM maps of the detergent-bound 
and detergent-free Tse6 secretion particle are EMDB: EMD-3112 and EMD- 
3113, respectively. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, three tables, and two movies and can be found with this article 
online at http://dx.doi.Org/10.1016/j.cell.2015.09.027. 
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SUMMARY 

Biological processes occur in complex environments 
containing a myriad of potential interactors. Unfortu- 
nately, limitations on the sensitivity of biophysical 
techniques normally restrict structural investigations 
to purified systems, at concentrations that are orders 
of magnitude above endogenous levels. Dynamic 
nuclear polarization (DNP) can dramatically enhance 
the sensitivity of nuclear magnetic resonance (NMR) 
spectroscopy and enable structural studies in bio- 
logically complex environments. Here, we applied 
DNP NMR to investigate the structure of a protein 
containing both an environmentally sensitive folding 
pathway and an intrinsically disordered region, the 
yeast prion protein Sup35. We added an exoge- 
nously prepared isotopically labeled protein to 
deuterated lysates, rendering the biological envi- 
ronment “invisible” and enabling highly efficient 
polarization transfer for DNP. In this environment, 
structural changes occurred in a region known to in- 
fluence biological activity but intrinsically disordered 
in purified samples. Thus, DNP makes structural 
studies of proteins at endogenous levels in biological 
contexts possible, and such contexts can influence 
protein structure. 

INTRODUCTION 

Structural investigations of biomolecules are typically confined 
to in vitro systems under limited conditions. Although investiga- 
tions yield invaluable insights, such experiments can never cap- 
ture all aspects of complex biological environments. Proteins 
must fold into their active conformations in complex environ- 
ments. This situation becomes perilous when considering pro- 
teins that must attain a particular conformation, but whose ener- 
getic folding landscapes are rather flat or have several local 



minima. In these cases, the environment can clearly influence 
the conformation by favoring one pathway over another. Such 
decisions can have striking biological consequences, as is the 
case for a variety of protein folding diseases (Dobson, 2001). 
The effect of environment becomes even more critical when 
considering the substantial fraction of the human proteome 
that encodes disordered proteins (Dunker et al., 2001). Intrinsi- 
cally disordered proteins (IDPs) are important components of 
the cellular signaling machinery, allowing the same polypeptide 
to undertake different interactions with different consequences 
(Wright and Dyson, 2015). Yet, structural characterization of 
these domains is notoriously difficult (Uversky, 2013). 

Yeast prions present both of these structural challenges as 
they have both environmentally sensitive protein folding land- 
scapes as well as intrinsically disordered regions. Yeast prions 
have provided a paradigm shift in our understanding of heritable 
biological information. They allow specific biological traits to be 
encoded and inherited solely though self-templating protein 
conformations. When a protein switches to its prion conforma- 
tion, its function changes. This altered function is passed from 
generation to generation by conformational self-templating and 
catalyzed division of the template to daughter cells. The most 
extensively studied yeast prion, [PSr] (Cox, 1965), is an amyloid 
conformer of the translation termination factor Sup35. In purified 
amyloid fibrils of the prion domain of Sup35, called NM, the 
N-terminal domain (N) adopts a beta-sheet-rich amyloid confor- 
mation while the adjacent middle domain (M) is intrinsically disor- 
dered (Frederick et al., 2014; Krishnan and Lindquist, 2005; 
Luckgei et al., 2013; Toyama et al., 2007). However, this is un- 
likely to be the case in vivo: the M domain is known to interact 
with many other biomolecules, including protein remodeling fac- 
tors that regulate prion inheritance. As a consequence, muta- 
tions in the M domain (Helsen and Glover, 2012; Liu et al., 
2002), or changes in the levels of protein chaperones (e.g., 
Hsp70) and protein remodeling factors (e.g., Hsp104) (Kiktev 
et al., 2012; Masison et al., 2009; Tuite et al., 2011) have pro- 
found effects on prion propagation. NM also physically associ- 
ates with protein chaperones (Allen et al., 2005), and at least 
one chaperone binding site has been localized to the M domain 
of NM (Helsen and Glover, 2012). Finally, a host of genetic data 
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suggests that protein-based inheritance is sensitive to the com- 
bination and stoichiometry of many other proteins, meaning that 
isolated study of prion structure can offer at best only partial 
insight into this paradigm shifting biology. 

Interest in prions is highlighted by the fact that similar struc- 
tural transitions figure in the pathologies of a wide variety of hu- 
man diseases. Prion strains were first described for the mamma- 
lian prion protein, PrP (Chien et al., 2004; Prusiner et al., 1998) 
and polymorphic amyloid forms have been reported for a variety 
of amyloidogenic protein associated with neurodegenerative 
disease (Guo et al., 2013; Kodali et al., 2010; Nekooki-Machida 
et al., 2009; Petkova et al., 2005). Upon structural characteriza- 
tion, only a portion of the protein is sequestered into the amyloid 
core. The amyloid cores of these fibers are flanked by intrinsi- 
cally disordered regions (Heise et al., 2005; Helmus et al., 
2008; Wasmer et al., 2009). More recently, amyloid forms of 
such proteins were demonstrated to have prion-like self-tem- 
plating dispersion properties in vivo (Jucker and Walker, 2013; 
Polymenidou and Cleveland, 2012; Watts et al., 2013). 

Nuclear magnetic resonance (NMR) is a powerful spectro- 
scopic method for studying molecular structure and dynamics. 
A key strength of this technique is that it can be used to study 
non-crystalline, amorphous samples. Indeed, there have been 
a handful of high-profile in-cell NMR studies (Band et al., 2013; 
Freedberg and Selenko, 2014; Inomata et al., 2009; Reckel 
et al., 201 2; Sakakibara et al., 2009; Selenko et al., 2006; Vaiphei 
et al., 201 1). These studies suggest that while protein structure 
can be perturbed, it is largely unchanged by the cellular context. 
However, these studies employed solution-state NMR to detect 
proteins at concentrations two or more orders of magnitude 
above endogenous levels inside cells, radically altering endoge- 
nous stoichiometries. Because solution-state NMR is limited by 
molecular tumbling times (that depend upon molecular size and 
solvent viscosity), the minority of the protein that might interact 
with cellular components would likely be undetectable. More- 
over, because this population would comprise a small fraction 
of the total biomolecule, it would be difficult, if not impossible, 
to detect the resulting signal loss. Solid state NMR is not limited 
by molecular correlation times in this way. Instead, solid-state 
NMR is limited by its low sensitivity. Dynamic nuclear polariza- 
tion (DNP) has the potential to alleviate this limitation by dramat- 
ically increasing the sensitivity of NMR spectroscopy, through 
the transfer of the large spin polarization that is associated 
with unpaired electrons to nearby nuclei (Abragam, 1983; 
Slichter, 1990). Theoretically, DNP can reduce experimental 
times by more than five orders of magnitude; an experiment 
that would require decades without DNP can be collected in a 
day with DNP. However, just as for other structural biology tech- 
niques, DNP sensitivity enhancements are critically dependent 
on experimental conditions (Ni et al., 2013) and sample compo- 
sition (Akbey et al., 2010, 2013; Takahashi et al., 2014) and the 
specificity of NMR is critically dependent upon the choice of iso- 
topic labeling (Wang et al., 2013). There is growing interest in 
application of DNP to complex systems. Several groups have 
applied DNP to investigate membrane proteins that were over- 
expressed to high levels in bacteria and have directly examined 
both concentrated membrane fractions and whole cells (Jacso 
et al., 2012; Renault et al., 2012; Yamamoto et al., 2015). We 



report conditions that enable high polarization transfer effi- 
ciencies in biologically complex environments. These are large 
enough to allow the characterization of a single protein at endog- 
enous concentrations in its native environment. Structural 
methods to investigate either intrinsically disordered proteins 
or environmentally sensitive protein folding are limited. Here, 
we present a generalizable approach for investigation of both 
of these challenging structural puzzles that lie at the heart of 
both fundamental biological questions and human diseases. 
Moreover, we demonstrate that including the biological context 
can influence protein structure. 

RESULTS 

NM Adopts an Amyloid Form in Cell Lysates at Low 
Concentrations 

We first confirmed the NM protein adopted its active conforma- 
tion at endogenous concentrations in a native environment. 
Previous studies have employed extensive serial dilution and 
propagation in purified in vitro conditions (Frederick et al., 
2014). To ensure that the exogenously added protein was faith- 
fully templated by the prion conformers from the cell lysate, we 
probed its structural state using semi-denaturing detergent 
agarose gel electrophoresis (SDD-AGE) (Bagriantsev et al., 
2006; Halfmann and Lindquist, 2008). NM did not form amyloid 
in lysates from cells that do not harbor the [PSr] prion form of 
Sup35 (Figure 1B). In contrast, NM was templated into an amy- 
loid form by both purified pre-formed fibers and lysates from 
cells that harbored the prion. We determined the concentration 
of templated, exogenously added NM was ~1 |iM by immuno- 
blot (Figure 1C), in good agreement with previously reported 
endogenous Sup35 concentrations of 2.5-5 |iM (Ghaemma- 
ghami et al., 2003). In this way, we prepared samples of isotopi- 
cally labeled NM amyloids at endogenous levels in a complex 
biological environment. 

Sensitivity and Specificity of DNP Magic Angle 
Spinning NMR 

Having established that NM adopts an amyloid conformation in 
cellular lysates, we prepared recombinant, ^H, ^^C-labeled NM 
and added it to cell lysates that had been grown in deuterated 
media with carbon isotopes in natural abundance. This created 
a spectroscopically active prion protein in an NMR silent cellular 
background. We prepared the sample for DNP magic angle spin- 
ning (MAS) NMR by addition of cryoprotectant (glycerol) and a 
stable biradical TOTAPOL (Song et al., 2006). We collected 1D 
^^CCH} cross polarization (CP) spectra of the cellular lysates 
both with and without microwave-driven polarization transfer 
from electrons to nuclei (DNP). Experiments using DNP resulted 
in significant signal enhancements relative to conventional NMR. 
DNP signal enhancements (e) at 21 1 MHz were between 50- and 
1 15-fold (Figures 2 and S1). The carbonyl carbon enhancements 
were similar to the maximal enhancements obtained for the 
reference system proline (e = 130) for this instrumental configu- 
ration. This establishes that DNP MAS NMR is well-suited to 
study complex biological mixtures. 

DNP enhances the NMR signal of all atoms in the sample. 
Interestingly, in samples with the uniformly ^^C-labeled protein at 
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Figure 1. NM Adopts an Amyloid Form in Cell Lysates at Low Concentrations 

(A) Prion status is maintained for yeast grown on deuterated media, indicating that the [PSP] protein folding phenotypes were robust to growth in a deuterated 
environment. Phenotypically prion minus IpsC] (red) or prion plus [PSP] (pink) yeasts were grown to mid-log phase in media made with either H 2 O or D 2 O and then 
spotted onto a one-fourth YPD plate. Plates were incubated at 30°C for 1 week. 

(B) Amyloid formation of purified NM-His® was visualized by SDD-AGE using an anti-His® antibody in prion minus {[psP]) cell lysates that do not contain 
endogenous prion templates, in the presence of 2% (w/w) purified amyloid seeds and in prion containing ([PSC]) cellular lysates that contain endogenous prion 
templates. As with the endogenous prion, boiling (b) destroyed the templated amyloid aggregates. 

(C) NM templated into the amyloid form in yeast cell lysates is not degraded and is present at endogenous levels. Full-length endogenous Sup35 runs at 100 kDa 
and is visualized with an antibody specific to the C-terminal domain. Cellular lysates both with and without exogenously added NM-His® as well as a concentration 
gradient of purified NM-His® were boiled in 2% SDS to denature any higher order aggregates and separated by SDS-PAGE before western blotting with an 
antibody specific for the His® epitope. NM-His® runs at 55 kDa. The endogenous concentration of Sup35 is between 2.5 and 5 laM. The ECL signal for NM-His® in 
lysates is less intense than that of purified NM-His® at a concentration of 1 .2 |iM, indicating that the concentration of exogenously added NM in the NMR sample is 
below 1 .2 i^M. 



endogenous concentrations, the content from its natural 
abundance is an order of magnitude larger than that of added 
NM protein. However, because the natural abundance of is 
1.1%, only 0.01 % of the sites in the cell lysate were adjacent 
to another site. Conversely, all the sites in the exoge- 
nously added uniformly ^^C-labeled NM had adjacent sites. 
To isolate signals from NM and filter out background sig- 
nals from the cell lysates, we collected one-bond dipolar 

recoupled correlation spectra using proton driven spin diffusion 
(PDSD) (Szeverenyi et al., 1982). In this 2D experiment, on-diag- 
onal peaks report on all sites in the sample while off-diagonal 
peaks, or cross-peaks, report only on sites that are directly 
bonded to another site. To determine the contributions of 
cell lysates to the correlation spectra, we used signals 

from p1,3-glucan, a major cell wall component that is well- 
resolved from protein signals. As expected, the ratio of the 
cross-peak C 1 -C 2 signal intensity relative to the diagonal Ci 
signal for pi ,3-glucan was 2.5% ± 2% of that for yeast grown 
on uniformly ^^C-enriched glucose. However, the ratio of the pro- 
tein carbonyl carbon (C')-carbon alpha (CJ cross-peak signal in- 
tensity relative to the diagonal C' signal intensity for the protein 
backbone region was 10-fold higher (21 % ± 2%) for the natural 
abundance sample containing added NM than the ratio for 
the pi ,3-glucan region. The protein signal was an order of 
magnitude larger than the lysate background expected from nat- 
ural abundance, establishing that the cross-peak signals in the 
correlation spectra report on the added NM and not 
on in the cellular lysates. To completely eliminate any con- 
cerns about the contribution of natural abundance from 
the cellular lysates, samples of prion-containing yeasts for struc- 



tural investigations were grown with ^^C-depleted (99.9% ^^C) 
glucose as the carbon source, further reducing the cross- 
peak intensity from the cellular lysates by two orders of magni- 
tude. Thus, the combination of DNP with this isotopic labeling 
scheme provides the sensitivity and specificity to observe a pro- 
tein at endogenous levels in a biologically complex native 
environment. 

To investigate the structural influence of cellular lysates on NM 
amyloid assembly, we compared spectra of NM fibers at endog- 
enous levels in cellular lysates to spectra of purified lysate-tem- 
plated NM fibers (Frederick et al., 2014). We conducted these 
experiments at higher magnetic fields (700 MHz rather than 
211 MHz) to achieve significant improvements in spectral re- 
solution (Barnes et al., 201 2; Michaelis et al., 201 4). We collected 
a one-bond dipolar-assisted rotational resonance 

(DARR) (Takegoshi et al., 2001) correlation spectrum on 1 mg 
of cryoprotected, purified NM fibrils in 6 hr. For 1 0 |ig of NM fibrils 
in unlabeled cellular lysates, we collected a one-bond 
DARR spectrum for 1 week. As expected, no cross-peaks for 
pi ,3 glucan were present in spectra of cellular lysates grown in 
depleted glucose. Inhomogeneous line broadening due to 
experimental temperatures required for DNP (83 K) potentially 
counteracts any gain in spectral resolution from higher magnetic 
fields. Thus, we compared spectra of purified NM fibers under 
DNP conditions to spectra of purified NM fibers at room temper- 
ature. In both samples, most of the resonances overlapped due 
to the number of sites and highly degenerate amino acid compo- 
sition of this protein, a common feature of prion proteins (Freder- 
ick et al., 2014). Nonetheless, the line widths of isolated side 
chain sites in the DNP spectra at 83 K are similar to those of 
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Figure 2. Dynamic Nuclear Polarization Enhances NMR Signals in 
Cellular Lysates 

(A) Preparation of samples for DNP MAS NMR of proteins at endogenous 
levels in biological environments. The protein of interest is expressed on 
isotopically enriched media and purified. The cellular background comes from 
cells grown on media containing D 2 O. The cells are lysed and the isotopically 
labeled protein is added exogenously to whole lysate. The mixture is pelleted, 
the pellet is resuspended in a matrix containing stable radical and cryopro- 
tectant, and the mixture is frozen for analysis by DNP MAS NMR. 

(B) One-dimensional spectra both with (black) and without DNP 

enhancement by microwaves (red). Dynamic nuclear polarization gave large 
signal enhancements (e) for uniformly ^^C-labeled NM in a deuterated 
matrix of cellular lysates containing a 60:30:10 (v/v) mixture of da- 
glycerol:D20:H20 and 10 mM TOTAPOLat211 MHz/140 GHz with w/27t = 4.3 
kHz and a sample temperature of 83 K. 

See also Figure SI . 



room temperature spectra (Figure S2). This establishes that DNP 
conditions did not compromise resolution gains at high magnetic 
fields, consistent with several other recent reports for cryogenic 
experiments on amyloid proteins (Debelouchina et al., 2010; 
Linden et al., 201 1 ; Lopez del Amo et al., 201 3) 

Native Environments Structure Intrinsically Disordered 
Regions 

Thus poised, we sought to determine the structural influences 
of the biological context on NM. The NMR chemical shift is a 
sensitive indicator of the secondary structure of the protein 
backbone. To investigate effects of lysates on NM secondary 
structure, we compared the backbone chemical shifts in the 
presence and absence of cellular lysates. To isolate signals 
from backbone C'-Ca sites, we projected the region of the 
one-bond DARR correlation spectra into one dimension 

(Figure 3). We fit the carbonyl region of the projections to a sum 
of three Gaussians that described the chemical shift distributions 
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Figure 3. The Secondary Structure of NM Fibers in Cellular Lysates 
Differs from the Secondary Structure of In Vitro Templated NM 

(A and B) Carbonyl carbon region of correlation spectra at 700 MHz 

using DNP MAS NMR of (A) cryoprotected purified NM fibers acquired in 6 hr 
and (B) cryoprotected NM fibers assembled in the presence of cellular lysates 
acquired in 1 week. 

(C and D) Examination of the carbonyl carbon (C') region of the spectra in 
projections of the Ca region (50-70 ppm indicated by dotted bracket) reveals 
the secondary structural composition of the protein backbone. The projection 
eliminates signals from non-backbone sites, such as the carbonyl moieties in 
the amino acid side chains like Asn and Gin. Dotted black lines indicate the 
expected chemical shift values for a-helical conformations of the protein 
backbone and highlight a large shift away from a-helical character for NM in 
lysates (D). The gray line represents the best-fitted solution to three Gaussian 
distributions describing the expected chemical shifts for the three possible 
secondary structural motifs: a helices (177.8 ± 1.5 ppm), random coils and 
turns (175.6 ± 1.5 ppm) and beta sheets (175.4 ± 1.55 ppm) (Wang and Jar- 
detzky, 2002). Fits to a sum of these three Gaussian distributions gave stan- 
dard estimates of error of 0.84 (C) and 0.93 (D). Residuals are plotted in 
Figure S3. 

(E) Relative secondary structure contributions (in percent) as determined by 
intensity of each Gaussian distribution for the protein backbone of purified NM 
fibers (top) and NM fibers assembled in lysates (bottom). The error bars 
represent the standard error for the fitted intensity of each of the Gaussian 
distribution. 

(F and G) The fitted intensities for a helices (black), random coils and turns (light 
blue) and beta sheets (magenta) are plotted with the fits (gray) from (C) and (D). 
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Figure 4. Complex Biological Environments Restructure Intrinsically 
Disordered Protein Regions 

(A-C) Side chains of NM fibers in ceiiuiar iysates have a different chemicai 
environment than in vitro tempiated NM. Aiiphatic region of (A) purified NM 
fibers at 283 K in protonated assembiy buffer (B) purified NM fibers at 83 K 
in 60% c/s-giyceroi and (C) NM fibers at 83 K in 60% c/s-giyceroi tempiated 
into the amyioid form in the presence of ceiiuiar iysates. See aiso Figures S2 
and S4. 

(D) Amino acid sequence of NM with positions of iysines (magenta) and proiine 
(cyan) highiighted. The N domain is biack and the M domain is gray. 



for a helices, random coil, and beta sheet conformations (Wang 
and Jardetzky, 2002) (Figure 3). At 283 K, NM fibers experience 
motion over a broad range of timescales (Frederick et al., 2014). 
The rigid regions of NM fibers at 283 K had a chemical shift dis- 
tribution consistent with a mix of turns and sheets (Figure S3). 
Cryoprotected NM fibers at 83 K had a chemical shift distribution 
that was dramatically shifted toward a-helical values, consistent 
with sequence-based secondary structural predictions for the M 
domain (Chou and Fasman, 1974; Cuff et al., 1998; Kumar, 
201 3). This change is likely a result of secondary structural stabi- 
lization effects from the low experimental temperature and the 
cryoprotectant (Mehrnejad et al., 2011; Vagenende et al., 
2009). In contrast, cryoprotected NM fibers that had been poly- 
merized in cellular lysates had a chemical shift distribution that 
was dramatically shifted away from a-helical values and toward 
beta sheet values (Figure 3). Thus, the cellular context had a pro- 
found effect on protein secondary structure. 

The NMR chemical shift is a sensitive indicator of chemical 
identity and structural conformation (Wang and Jardetzky, 
2002). To determine which amino acid types undergo changes 
in their secondary structure in cellular contexts, we therefore 
compared the aliphatic region of the correlation spectra 

because this region reports on the chemically diverse amino acid 
side chains. The amyloid core of NM is largely composed of N, Q, 
and Y residues. In purified room temperature samples, the 
'i3c-'i3c correlation spectra was consistent with an amyloid 
core containing N, Q, and Y residues in rigid beta sheet and 
turn conformations. Changes in the secondary structure at the 



a carbon for N, Q, and Y from a beta sheet or random coil confor- 
mation to an a-helical conformation result in an average change 
in chemical shift of ~4 ppm (Wang and Jardetzky, 2002). The 
average chemical shift values for this region of the spectra at 
83 K for both purified NM and NM in cell lysates were the 
same as those for the room temperature sample, consistent 
with the amyloid character of NM being maintained at low tem- 
peratures and unperturbed by a biological context (Figures 4 
and Figure S4). Thus, the secondary structural changes (Figure 3) 
were not derived from a structural rearrangement of the amyloid 
core. 

We next compared the aliphatic region of the correla- 

tion spectra to determine if amino acid types found in the M 
domain of NM were affected the biological contexts. In purified 
room temperature spectra of NM, the correlation 

spectra was consistent with previous findings that established 
that the M domain is highly dynamic with random coil character 
(Frederick et al., 2014; Luckgei et al., 2013); side chain reso- 
nances for the amino acid types found only in the M domain 
were absent. At 83 K, cross-peaks for methyl-bearing amino 
acids such as threonine, valine, isoleucine and leucine found in 
the M domain were absent from both spectra due to tempera- 
ture-dependent dynamically mediated relaxation of methyl- 
bearing amino acid side chains at this temperature (Bajaj et al., 
2009; Beshah et al., 1987). However, at 83 K, lysine C6-Ce and 
proline Cy-C6 cross-peaks were present in the DNP MAS 
NMR spectra of both purified NM and of NM in cell lysates. Un- 
like the amyloid core residues, these amino acid types had very 
different chemical environments with differences in chemical 
shift of 5 ppm or greater depending on whether or not the fibers 
were tempiated in cellular lysates (Figure 4). Proline residues are 
present throughout the sequence of NM, while lysine residues 
are found only in the M domain (Figure 4D) localizing the regions 
experiencing large structural changes to the M domain. There 
are 25 lysine residues in the M domain of NM that contribute to 
the signal, all of which have different chemical environments 
and therefore different chemical shifts. The dramatic change in 
the shape of the lysine C6-Ce cross-peak indicates that a large 
proportion of the lysine side chains have a dramatically altered 
chemical environment in cellular lysates, indicating the majority 
of the M domain is involved. This establishes that the M domain, 
which contains chaperone-binding sites critical for faithful prion 
inheritance, makes many interactions with such components 
in vivo. 

Multiple lines of evidence reveal that chaperone proteins 
directly interact with NM fibers. For example, the Hsp70 chap- 
erone proteins Ssa1p and Ssa2p interact with NM aggregates 
(Allen et al., 2005), are among the top one hundred most highly 
expressed proteins (Ghaemmaghami et al., 2003) and the major 
components of amyloid aggregates isolated from yeast (Ba- 
griantsev et al., 2008). In prion-containing cells, NM forms mem- 
brane-free cellular structures with specific cellular localizations 
(Tyedmers et al., 2010). Within these structures, NM amyloid fi- 
bers are deposited in highly ordered arrays of regularly spaced 
fibrils. These arrays consist of bundles of fibers organized by in- 
ter-fibrils structures that are thought to be an Hsp70 because 
cells lacking Hsp70 can no longer form ordered arrays (Saibil 
et al., 2012). This organization may be important for the faithful 
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inheritance of the prion by daughter cells or for mitigating the 
toxicity that is otherwise associated with protein aggregation. 
The direct observation of NM structure in its biological context 
indicates that these organizing protein-protein interactions are 
mediated through the M domain of the protein via the adoption 
of a beta sheet secondary structure by the majority of this other- 
wise intrinsically disordered region. This work suggests that 
disordered regions that are often observed in purified fibril sam- 
ples may be intimately involved with cellular components to 
create a self-organization mechanism that coordinates fiber 
deposition. 

DISCUSSION 

Application of high-field DNP MAS NMR methodology to a chal- 
lenging biological system allowed us to pursue a scientific ques- 
tion that was previously impossible due to limits in instrumental 
sensitivity. Without DNP, these experiments would not be 
possible. With DNP MAS NMR, we detected prion fibrils that 
had been assembled in a complex cellular environment contain- 
ing all of the potential organizing protein components, such as 
chaperones, at their endogenous levels and stoichiometries. 
We established such fibers are structurally distinct from purified 
fibers in a region that is intrinsically disordered and highly dy- 
namic in purified systems. The cellular environment structures 
an intrinsically disordered region. Sup35 is not unique; over a 
third of encoded proteins are predicted to be intrinsically disor- 
dered (Dunker et al., 2001). Indeed, intrinsically disordered pro- 
tein regions have important roles in many biological processes, 
yet their structural characterization is notoriously difficult. Using 
DNP NMR, we can directly observe a protein of interest in its bio- 
logical context. We found that the intrinsically disordered domain 
makes many direct interactions with cellular components. For 
NM, this suggests the M domain may be responsible for medi- 
ating interactions with the inter-fiber structures involved in prion 
fibril bundle organization visualized using in vivo cryotomogra- 
phy (Saibil et al., 2012). 

Our results demonstrate not only that structural studies of pro- 
teins in their native contexts are possible, but also that the native 
context can and does have a dramatic influence on protein struc- 
ture. We anticipate that our methodology will enable structural 
investigations of heterotypic quaternary interactions between a 
protein of interest and cellular constituents. The methods 
described in this work can be extended to further investigations 
of protein conformation in biologically relevant environments. 
For example, protein structures can be determined in cellular 
contexts that have been modified, either genetically by deletion 
or overexpression of a protein or by the addition of small mole- 
cule agonists. Moreover, because the protein of interest is pre- 
pared exogenously, the full suite of specific isotopic enrichment 
schemes can be employed (Jaipuria et al., 2012) or segmentally 
isotopically labeled proteins can be used to obtain atomic level 
structural insights for otherwise crowded spectra (Volkmann 
and IwaT, 2010). These approaches will be particularly useful 
for structural investigations of protein folding and mis-folding in 
native and perturbed environments. There are a large number 
of protein folding diseases and work across many fields of study 
is continually uncovering genetic, physical and chemical modu- 



lators of their pathobiology. Our approach will allow direct obser- 
vation of the structural consequence of such modulators. 
Thus, this work provides the framework to answer structural 
questions about the toxic and non-toxic conformations of dis- 
ease-associated proteins in a way that is directly informed by ge- 
netic backgrounds and biological phenotypes. This will allow us 
to investigate how genetic backgrounds modify the energetic 
landscape of protein folding and will enable tight coupling of ge- 
notypes, phenotypes, and environments with specific structural 
arrangements. 

EXPERIMENTAL PROCEDURES 
Sample Preparation 

Both untagged NM and C-terminally His® tagged NM were expressed and pu- 
rified as described eisewhere (Serio et al., 1999). Uniformly labeled NM 
samples were prepared by growing BL21(DES)-Rosetta Escherichia coii in 
the presence of M9 media with 2 g D-glucose (Cambridge Isotope 

Labs). Purified, lysate-tem plated NM seeds for the purified fiber sample were 
prepared as described elsewhere (Frederick et al., 2014), using cell lysates 
from a strong [PSr] yeast strain. One milligram of purified denatured ^Re- 
labeled NM was diluted 120-fold out of 6 M GdHCL into 4 ml of lysis buffer 
(see below) containing 0.02 mg preformed fibers. The reaction was allowed 
to polymerize for 24 hr at 4°C and fibers were collected by ultracentrifugation 
at 430,000 X g for 1 hr. Bradford analysis revealed that removal of the super- 
natant decreased the total protein content of the sample by one-third. The pel- 
let was resuspended in 60:30:10 (v/v/v) mixture of ^RC-depleted c/s-glycerol 
(99.9% ^RC):D20:H20 (Rosay et al., 2010) containing 10 mM of the stable bir- 
adical TOTAPOL (Corzilius et al., 2014; Lange et al., 2012; Song et al., 2006). 

Cell Lysate Samples for DNP 

Phenotypically strong [PSP] yeast were grown in a 20 ml culture volume at 
30°C to mid-log phase in YPD media made with protonated carbon sources 
and 100% D 2 O. Because we use protonated carbon sources, the final deuter- 
ation level for the lysates is estimated to be 70% (Leiting et al., 1998). Cells 
maintained their [PSP] status in deuterated media (Figure 1A). Cells were 
collected by centrifugation (5 min, 4,000 x g) and washed once with water 
and once with D 2 O. Pellets were suspended in 200 ^il of lysis buffer (50 mM 
Tris-HCI pH 7.4, 200 mM NaCI, 2 mM TCEP, 5% ds-''®C-depleted glycerol, 
1 mM EDTA, 5 |ag/ml of aprotinin and leupeptin and 100 |ag/ml Roche protease 
inhibitor cocktail; lysis buffer was 80% [v/v] D 2 O.) Cells were lysed by bead 
beating with 500 |im acid washed glass beads for 8 min at 4°C. After bead 
beating, the bottom of the Eppendorf tube was punctured with a 22G needle 
and the entire lysate mixture was transferred to a new tube. Purified denatured 
"'RC-labeled NM was diluted 1 50-fold out of 6 M GdHCI to a final concentration 
of 5 laM and the mixture was allowed to polymerize, quiescent, at 4°C for 24 hr. 
Unassembled NM was removed by centrifugation at 20,000 x g for 1 hr at 4°C 
and removal of the supernatant. The -^30 ^il pellet was resuspended in 30 ^il of 
100% c/s-^®C-depleted glycerol containing 20 mM TOTAPOL and transferred 
to a 4 mm sapphire rotor. The final radical concentration was 10 mM (Corzilius 
et al., 2014) and the glycerol concentration was 60% (Rosay et al., 2010). The 
cell lysate sample for high field DNP was made analogously, except that yeast 
cells were grown in SD-CSM media made with D 2 O and 2% (w/v) protonated 
^^C-depleted glucose (99.9% ^^C, Cambridge Isotope Labs) as the carbon 
source. Uniform ''RC-labeled samples were grown using U-"'RC glucose (99% 
Cambridge Isotope Labs) as the carbon source. The final sample volume 
was 20 1^1 and the sapphire rotor had a 3.2-mm diameter. 

Immunohistochemistry 

Cell lysate samples were made as described above, except NM-His® was 
substituted for NM. SDD-AGE was performed as described (Halfmann and 
Lindquist, 2008), and NM was visualized using an anti-His® antibody. Cell ly- 
sates were fractionated by SDS-PAGE, transferred to nitrocellulose and 
probed with both anti-His® and anti-Sup35 antibodies. For SDD-AGE analysis 
we prepared cellular lysates as described above and added 5 |iM purified 
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denatured NM-His® to reactions containing ceiiuiar iysates from prion minus 
{\psi~]) cuitures, purified NM fibers prepared in isoiation (2% seeding w/w), 
and ceiiuiar iysates from prion pius ([PSC]) cuitures. For western biot anaiysis, 
iysate sampies were denatured by incubation at 95°C for 10 min in the 
presence of 2% SDS before fractionation to denature amyioid aggregates. 
Secondary antibodies were coupied to horseradish peroxidase. Biots were 
visuaiized by a standard ECL anaiysis. 

Spectroscopy 

DNP MAS NMR experiments were performed on custom-designed home-buiit 
instruments, consisting of a 212 MHz ("'H, 5T) (Becerra et ai., 1993) and a 697 
MHz (^H, 1 6.4 T) (Barnes etal., 2012; Michaelisetal., 2014) NMR spectrometer 
(courtesy of Dr. David Ruben, Francis Bitter Magnet Laboratory, MiT) 
equipped with custom-buiit 140 and 460 GHz gyrotrons (Joye et ai., 2006) 
(i.e., high power microwave devices generating up to 12 W), respectiveiy. 
DNP MAS NMR spectra were recorded on home-buiit 4 mm (211 MHz) 
quadrupie resonance ("'H, "'^C, "'^N, and e“) or 3.2 mm (700 MHz) tripie 
resonance (^H, and e~) cryogenic probes equipped with Kei-F stators 
(Revoiution NMR). Microwaves were guided to the sampie via circuiar over- 
moded waveguide in which the inner surface has been corrugated to reduce 
mode conversion and ohmic iosses. Sampie temperatures were maintained 
beiow 85 K, with spinning frequencies of Wr/27r = 4.3-10 kHz. 

^^C{^H} cross poiarization (Pines et ai., 1973) spectra were acquired with a 
contact time of 1 .5 ms. Recycie deiays were chosen as 7 b (poiarization buiidup 
time constant) x 1 .26 (Figure SI), yieiding optimum sensitivity per unit of time. 
The recycie deiays were 4.6 s and 8 s for 21 1 MHz and 700 MHz, respectiveiy. A 
series of DARR spectra were recorded using either a mixing period of 6 

or 1 5 ms, 64-51 2 co-added transients and, between 60 and 1 00 12 increments. 
Aii data were acquired using high-power TPPM ^H decoupiing (yS-i > 83 kHz). 
Enhancements at 211 MHz are reported in Figure 2 and those at 700 MHz 
were estimated at -8 to -10. DNP enhancements at both fieids were ^80% 
of the maximai enhancements recorded on a standard sampie of proiine. 
Experimentai data were processed using RNMR (1D) or NMRpipe (Deiaglio 
et ai., 1995) (2D) and anaiyzed using Sparky (Goddard and Kneiier, 2006). ''^C 
NMR data were referenced to adamantane (Morcombe and Ziim, 2003) 
(40.49 ppm at room temperature), and KBr was used to set the magic angie. 
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SUMMARY 

Self-avoidance, a process preventing interactions of 
axons and dendrites from the same neuron during 
development, is mediated in vertebrates through 
the stochastic single-neuron expression of clustered 
protocadherin protein isoforms. Extracellular cad- 
herin (EC) domains mediate isoform-specific hemo- 
philic binding between cells, conferring cell recogni- 
tion through a poorly understood mechanism. Here, 
we report crystal structures for the EC1-EC3 domain 
regions from four protocadherin isoforms represent- 
ing the a, 3, and y subfamilies. All are rod shaped and 
monomeric in solution. Biophysical measurements, 
cell aggregation assays, and computational docking 
reveal that trans binding between cells depends on 
the EC1-EC4 domains, which interact in an antipar- 
allel orientation. We also show that the EC6 domains 
are required for the formation of c/s-dimers. Overall, 
our results are consistent with a model in which pro- 
tocadherin c/s-dimers engage in a head-to-tail inter- 
action between EC1-EC4 domains from apposed cell 
surfaces, possibly forming a zipper-like protein as- 
sembly, and thus providing a size-dependent self- 
recognition mechanism. 

INTRODUCTION 

The human brain is composed of ~10 billion neurons, each of 
which can connect with up to thousands of others. Neuronal 
self-avoidance is a process in which dendrites and axons origi- 
nating from the same neuron repel one another but can freely 
interact with neurites from other neurons. The combined proper- 
ties of self-recognition and non-self-discrimination require that 
contacting neurons display diverse cell-surface identities that 

CrossMark 



allow for discrimination between self and non-self (Hattori et al., 
2009; Zipursky and Grueber, 2013; Zipursky and Sanes, 2010). 

In Drosophila and other invertebrates, self-avoidance is medi- 
ated by Dscami proteins— immunoglobulin superfamily mem- 
bers produced by alternative splicing of the DSCAM1 pre- 
mRNA. This cell-autonomous and stochastic alternative splicing 
can theoretically produce up to 19,008 Dscami isoforms with 
distinct ectodomains, each of which have highly specific hemo- 
philic trans binding specificity (Hattori et al., 2008; Miura et al., 
2013; Schmucker et al., 2000; Wojtowicz et al., 2007). Distinct 
cell-surface identities are generated in Drosophila by the sto- 
chastic expression of a small set of Dscami isoforms in each 
neuron (Miura et al., 2013). Hemophilic interactions between 
identical sets of protein isoforms on the surface of neurites 
from the same neuron result in repulsion and neurite self-avoid- 
ance (Hattori et al., 2008). The expression of even a single 
Dscami isoform is sufficient for self-avoidance of neurites 
from the same neuron (Hughes et al., 2007; Matthews et al., 
2007; Soba et al., 2007). However, robust non-self-discrimina- 
tion, which allows processes from different neurons to freely 
interact, requires thousands of distinct Dscami isoforms (Hattori 
et al., 2009). 

Recent studies suggest that, in vertebrate nervous systems, 
neuronal self-avoidance functionality is provided, at least in 
part, by the clustered protocadherins (Pedhs) (Chen and Mania- 
tis, 2013; Zipursky and Grueber, 2013; Zipursky and Sanes, 
2010). Mammalian Pedhs are encoded in a contiguous genomic 
locus composed of three adjacent gene clusters (Pedh a, (3, 
and y), each of which contains close to 60 “variable” exons 
(58 in mice. Figure 1A) (Wu and Maniatis, 1999). Only a few var- 
iable exons are stochastically chosen for expression in each cell 
by a mechanism involving alternative promoter choice (Ribich 
et al., 2006; Tasic et al., 2002). Each variable exon encodes 
an entire Pedh ectodomain region consisting of six tandem 
extracellular cadherin (EC) domains, a single transmembrane 
region, and a short cytoplasmic region. In the a and y gene clus- 
ters, a “constant” C-terminal cytoplasmic region encoding an 
intracellular domain (ICD) is joined to the variable ectodomain 
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Figure 1. Crystal Structures of Four Pcdh 
EC1-EC3 Isoforms 

(A) The Pcdh genomic locus contains three adja- 
cent clusters of variable exons. Each exon en- 
codes an entire ectodomain comprising six EC 
domains, a transmembrane (TM) domain, and 
a short cytoplasmic region. Alpha and gamma 
clusters also contain three constant exons that 
encode a cluster-specific ICD, which are joined 
by pre-mRNA splicing for alpha and gamma clus- 
ters. C-type Pcdh exons are shown in pink and 
light blue for the alpha and gamma clusters, 
respectively. 

(B) Crystal structures of EC1-EC3 regions from 
PcdhaC2, Pcdhpi, PcdhyAS, and PcdhyCS 
shown in ribbon representation. Ca^"^ ions are 
drawn as green spheres. N-glycans and 
conserved 0-mannose residues are drawn as 
sticks. The inter-domain calcium binding sites are 
arranged similarly to those observed in classical 
cadherins (expanded view). See also Figure S1 
and Table S1 . 

(C) Comparison of the PcdhyCS and type I clas- 
sical C-cadherin structures. The overall architec- 
ture of classical cadherin ectodomains has a 
curved shape with an approximate 90° angle be- 
tween EC1 and ECS (Boggon et al., 2002). In 
contrast, the architecture of Pcdh EC1-EC3 
domain regions is characterized by an extended 
zigzagged conformation. 

(D) EC2-EC3 angles distinct from classical cad- 
herins account for the extended zigzagged 
conformation of the Pcdh structures. EC1-EC3 
domains are drawn as blue (PcdhyCS) and yellow 
(C-cadherin) ovals. Angles shown are between 
principal axes of inertia for adjacent domains. 



exon by pre-mRNA splicing. The p cluster does not contain such 
a constant region, and therefore, p-Pcdhs are lacking an ICD. 
The a and y gene clusters also encode a small set of “C-type” 
Pcdhs, which are divergent from other members of their respec- 
tive clusters and appear to have distinct functions (Figure 1A) 
(Chen et al., 2012). Deletion of the Pcdhy gene cluster in mice 
leads to the disruption of self-avoidance in retinal starburst ama- 
crine cells and Purkinje cells with phenotypes similar to those 
described for Dsca/7?7 deletion mutants in Drosophila (Lefebvre 
et al., 2012). 

Like invertebrate Dscam proteins, Pcdh isoforms engage in 
isoform-specific trans hemophilic interactions (Schreiner and 
Weiner, 2010; Thu et al., 2014). It is remarkable that Pcdhs, 
with only 58 isoforms, can mediate neural self-recognition and 
non-self-discrimination similar to Dscams, which have up to 
tens of thousands of distinct extracellular isoforms. Central to 
this capability is the observation that a single mismatched 
Pcdh isoform can interfere with recognition between cells that 
express an otherwise matching set of Pcdhs (Thu et al., 2014). 
Understanding the mechanism underlying this “interference” 
phenomenon is crucial, as it is likely to explain how only 58 
Pcdh isoforms can provide sufficient functional diversity to 
enable self-recognition and non-self-discrimination in the ner- 
vous system comparable to the much more diverse Drosophila 
Dscam gene. 



Here, we report crystal structures of Pcdh extracellular pro- 
tein fragments comprising the previously mapped Pcdh speci- 
ficity-determining EC1-EC3 domains for PcdhaC2, Pcdhpl, 
PcdhyA8, and PcdhyC5 isoforms, thus providing examples 
from all three Pcdh gene clusters. Guided by these structures, 
we used two orthogonal mutagenesis approaches— surface- 
saturating arginine mutagenesis and bioinformatics-derived pre- 
dictions— to map the isoform specificity-determining regions at 
the amino acid level using cell aggregation and biophysical ex- 
periments as readouts. The two approaches yielded consistent 
results, revealing an essential role for EC1 through EC4 in trans 
homophilic interactions and for EC6 in c/s interactions. On the 
basis of these findings, we propose a model for Pedh-mediated 
cell-cell recognition that is consistent with the remarkable ability 
of these cell-surface proteins to provide diverse single-cell iden- 
tities to vertebrate neurons. 

RESULTS 

Structures of Pcdh EC 1 -ECS Region Fragments from a, 
3, and y Sub-families 

We determined crystal structures of proteins composed of the 
three N-terminal EC domains of mouse PcdhaC2, Pcdhpl, 
PcdhyA8, and PcdhyC5 to a resolution of 2.4 A, 3.3 A, 2.9 A, 
and 2.9 A, respectively (Figure IB and Table SI). We focused 
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Table 1. Analytical Ultracentrifugation Analysis of Clustered- 
Protocadherins Homo-oligomerization 



Protein 


Oligomeric State 


Kd Oligomerization (|iM) 


a7EC1-EC3 


monomer 


NA 


aC2Eci-EC3 


non-specific dimer 


242 ±0.1^ 


P1eci-ec3 


monomer 


NA 


yA8Eci-EC3 


disulfide-linked dimer 


NA 


yC5Eci-EC3 


monomer 


NA 


yC5Eci-EC3 
extended N-term 


monomer 


NA 


0^7ec1-EC5 


dimer 


2.9 ± 0.5 


aC2Eci-EC4 


dimer 


20 ± 1 .2 


aC2Eci-EC5 


dimer 


5.9 + 0.8 


yC5Eci-EC5 


dimer 


100 ±4.3 


yB6Eci-EC4 


dimer 


29 ± 4.9 


yA8Eci-EC4 


dimer 


30 ± 1 .5 


yC 5 EC 2 -EC 6 


dimer 


18 ±0.2 


yA 8 EC 2 -EC 6 


dimer 


23 ±8.1 


aC2EC2-EC6 


dimer 


8.9 ± 0.3 


aC 2 Eci-EC 6 


tetramer 


0 . 1 *’ 


yC5Eci-EC6 


tetramer 


7.6'= 


yB 6 Eci-EC 6 


tetramer 


0.2 


yA8Eci-EC4 II 16R 


monomer 


NA 


yC5EC2-EC6 S1 1 6R 


dimer 


14 


aC2Eci-EC6 S1 1 8R 


tetramer 


1.8^" 


yC5Eci-EC6 S1 1 6R 


dimer 


5.7 



= 2; Isodesmic Ki = 359 ^iM; Ki/Ko = 1 .48. 
of a tetramer was obtained by locking the c/s-interaction Kd as 
obtained from EC2-EC6 deletion constructs. 



on protein fragments containing EC1-EC3, since the results of 
earlier cell aggregation experiments indicated that Pcdh iso- 
form-specific recognition was mediated via the EC2-EC3 do- 
mains and that the EC1 domain is required for trans binding 
(Schreiner and Weiner, 2010). 

The four structures show high overall similarity (Figures 1B 
and SI A). Each structure consists of three EC domains, each 
with the two-layer p sheet fold observed in classical cadherins. 
Successive domains are connected by calcium-binding linkers, 
each of which coordinate three Ca^"^ ions utilizing side chains 
in the same conserved motifs (Figure IB). These motifs are 
also conserved within type I and type II classical cadherins 
with the exception of the EE motif (bottom of EC1 domain. Fig- 
ure IB), which is present only in type-ll cadherins. In contrast 
with previous conclusions (Schreiner and Weiner, 2010) but 
consistent with the presence of Ca^'^ at the inter-domain linkers 
and in common with classical cadherins, we have found that cell 
aggregation of Pcdhs is Ca^"^ dependent (Figure SI B). Despite 
these similarities to classical cadherins, the Pcdh isoform struc- 
tures are distinctive in several aspects. Most notably, the overall 
arrangement of the three EC domains in each structure is much 
straighter than the curved classical cadherin architecture (Fig- 
ure 1 C). This “straight-rod” architecture arises from an extended 
zigzagged conformation: an arrangement that is generated pri- 



marily by a very different EC2-EC3 angle than classical cadher- 
ins (> 31° difference. Figure ID). 

In addition, mass spectrometry analyses showed that all four 
isoforms contain two sites of 0-mannosylation at residues 194 
and 196 (PcdhyC5 sequence numbering; Figures IB, S1G, and 
S1H). These positions are conserved in sequence among most 
Pcdh isoforms (Figure SI G) and among classical cadherins (Ves- 
ter-Christensen et al., 2013), suggesting that these 0-glycans 
play important functional roles. 0-mannosylation of cadherins 
and protocadherins were recently discovered (Vester-Christen- 
sen et al., 2013), and it was further shown that 0-mannosylation 
of E-cadherin is essential for preimplantation development of the 
mouse embryo (Lommel et al., 2013). 

The Pcdh structures show local Pcdh-specific embellishments 
on the EO domain fold. In particular, Pcdh E01 domains show a 
number of differences from vertebrate cadherin E01 domains 
(Figure SID), as was previously observed in NMR structures of 
Pcdha4 and Pcdhpi4 E01 domains (Morishita et al., 2006). 
The A strand is shorter than that of classical cadherins and lacks 
the conserved Trp-2 residue, which anchors the strand-swap 
frans-binding interface of classical cadherins (Figures SIC and 
SID; Posy et al., 2008). The EC1 EF loop region in each of the 
Pcdh structures contains a disulfide-constrained loop formed 
by a Pcdh-specific CX5C motif. The EC2 and EC3 domains of 
the Pcdh structures are each most similar to either the EC1 
or EC2 domain from the atypical cadherin-23 (RMSD 1 .5 and 
1.2 A). However, the D and E strands of Pcdh EC2 domains, 
and the CD loop region of EC3, are significantly longer than 
found in cadherin-23 or in classical cadherins (Figure SI E). There 
are also distinctive differences among the structures of the four 
Pcdh isoforms. The EC1 BC loop helix, C strand, and CD loop 
regions display distinct conformations in all four structures 
(Figure SI F). In EC3, the two C-type structures (PcdhaC2 and 
PcdhyC5) have a longer FG loop than Pcdhpl and PcdhyAS, a 
feature conserved among a and C-type Pcdhs (Figure SI F). 

Analysis of the molecular packing of the four Pcdh EC1-EC3 
structures revealed different crystallographic contacts for each 
isoform with no interfaces in common. Interfaces exhibiting 
typical protein-protein interface attributes were not identified in 
any of the crystal forms analyzed. 

Analytical Ultracentrifugation and Ceil Aggregation 
Assays Define the Multimeric Structure of Pcdhs 

We expressed and purified proteins from a C-terminal deletion 
series comprising EC1-EC6, EC1-EC5, EC1-EC4, and EC1- 
EC3 and a construct comprising domains EC2-EC6 where EC1 
was deleted. Using analytical ultracentrifugation (AUC), we as- 
sessed the oligomerization state of each of these ectodomain 
fragments in solution. With the exception of PcdhyAS, all EC1- 
EC3 Pcdh isoform fragments behaved as monomers (Table 1). 
This finding was consistent with our crystal structures in which 
no apparent binding interfaces were detected. The PcdhyAS 
EC1-EC3 fragment formed a disulfide-linked dimer through 
cysteine 283 in the EC3 domain (Figures S2A and S2B); however, 
this disulfide bond is likely artifactual since it is not detected in 
the larger PcdhyA8 isoform fragment (EC1-EC4) (Table 1). 

In contrast to monomeric EC1-EC3 fragments, EC1-EC4 or 
EC1-EC5 Pcdh fragments were observed to self-associate as 
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Figure 2. Elements of Pcdh c/s and trans Binding 

(A) Correlating multimerization states of truncated Pcdh proteins with their cell-cell recognition properties. Cells transfected with Pcdh deletion series plasmid 
constructs were tested for aggregation. With the exception of EC2-EC6 Pcdh fragments and PcdhyCS EC1-EC4, all deletion proteins that formed oligomers in 
solution also mediated cell aggregation. Full-length Pcdha4 includes the EC6 domain from PcdhyCS so it could be delivered to the cell surface. 

(B) Probing hemophilic interaction interface by arginine-scanning mutagenesis. Residues mutated to arginine are drawn in space filling representation. In blue are 
mutations that did not disrupt recognition, in orange are mutations that weakened recognition, and in red are mutations that abolished cell-cell recognition. 
Excluding residue 142, all the effective arginine mutants are located along one side of the molecule. 

(C) Cell aggregation experiments showing the mutations in part (B) that weakened or abolished interactions. See also Figure S2C. 

(D) In other Pcdh isoforms, residues analogous to the effective PcdhyCS arginine mutants had similar effects on the cell-cell recognition in the majority of cases. 



dimers with dissociation constants (Kp) in the micromolar range 
(2.9-1 00 pM) that varied significantly between isoforms (Table 1 ). 
The EC1 -deleted constructs comprising domains EC2-EC6 also 
formed homodimers in solution, with Kd values in the low micro- 
molar range (8.9-23 pM). Importantly, AUC measurements for 
complete ectodomains, including EC1-EC6, could be fit only 
to a tetramer (dimer-of-dimers) model, indicating a crucial role 
for the EC6 domain in Pcdh association (Table 1). 

We expressed similarly truncated Pedhs in K562 cells and as- 
sessed their ability to mediate cell aggregation. K562 cells pro- 
vide a robust assay for Pcdh cell-cell recognition, as they do 
not express endogenous Pedhs and do not spontaneously 
aggregate in liquid culture (Reiss et al., 2006; Schreiner and 



Weiner, 2010; Thu et al., 2014). Cells expressing the EC1-EC3 
fragment, which was found to be monomeric in solution, failed 
to produce cell aggregates (Figure 2A). In contrast, with the 
exception of PcdhgCS EC1-EC4, which forms a non-natural di- 
sulfide between monomers, cells expressing EC1-EC4, EC1- 
EC5, or the complete ectodomain (EC1-EC6) showed extensive 
aggregation for all isoforms tested (Figure 2A). Consistent with 
previous studies (Schreiner and Weiner, 2010; Thu et al., 
2014), cells expressing Pcdh EC2-EC6 fragments, which were 
shown above to homodimerize in solution, did not aggregate 
(Figure 2A). Detection of two independent dimers, one of which 
(generated by EC1-EC4 and EC1-EC5 fragments) correlates 
with cell-cell aggregation, whereas the other (generated by 
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EC2-EC6 fragments) does not (Figure 2A), strongly suggests 
that EC1-EC4 and EC1-EC5 fragments mediate trans interac- 
tions while the EC2-EC6 fragments mediate c/s interactions 
involving the most membrane-proximal domain, EC6 (see also 
below). The observation that full-length ectodomains form 
apparent tetramers in AUC strongly suggests that this molecular 
species corresponds to a dimer-of-dimers formed by these two 
distinct interfaces, one mediating c/s and the other trans 
interactions. 

Structural Elements of the trans- Binding Interface 
Arginine-Scanning Mutagenesis 

Selected non-basic surface residues of the PcdhyCS EC1-EC3 
domains revealed in the crystal structure were individually 
mutated to arginine, and the homophilic recognition function 
of these single-arginine mutant proteins was assessed using 
the K562 cell aggregation assay. Selected basic surface resi- 
dues were mutated to glutamic acid. As expected, the majority 
of single-point mutant proteins exhibited wild-type cell aggrega- 
tion phenotypes (Figure S2C). In contrast, cells transfected with 
the arginine point mutant L87R in the EC1 domain, S116R and 
T142R in the EC2 domain, and M301R and E302R in the EC3 
domain of PcdhyCS showed no detectable aggregation (Figures 
2B and 2C). Cells transfected with the EC2 S114R mutation 
showed diminished homophilic binding (Figure S2C). S114 
and S116 are located in the AB loop connecting the A and B 
p strands in EC2, whereas M301 and E302 are located in the 
FG loop of EC3. All are located on one side of the molecule 
and are very close to one another in space, thus defining a 
potentially continuous homophilic recognition interface with 
elements distributed over the EC2 and EC3 domains. Notably, 
L87 in EC1 faces in the same direction although T142 in EC2 
does not. 

To determine whether this binding region is unique to 
PcdhyCS, we produced mutants for isoforms from all three 
Pcdh gene clusters for residues structurally equivalent to 
PcdhyCS positions 87, 116, and 301. Mutations equivalent to 
301 R abolished homophilic recognition for isoforms from all 
three gene clusters (Pcdha7, PcdhaC2, Pcdhp6, PcdhyA8, and 
PcdhyB6; Figure 2D). Homophilic recognition was abolished 
for mutations equivalent to 1 16R for isoforms from the a and y 
gene cluster members (Pcdha7, PcdhaC2, and PcdhyA8), but 
not for the isoforms we tested from the p and yB clusters (Fig- 
ure 2D). Finally, mutations equivalent to L87R abolished homo- 
philic recognition for PcdhyA8 and diminished homophilic 
recognition for Pcdha7. It is possible that homophilic recognition 
for the Pcdhp6 and PcdhyB6 isoforms may not involve residues 
87 in EC1 and 116 in EC2, or alternatively, arginine mutants 
of these residues might not appropriately test their contribution 
to binding. Below, we show that isoforms from the a and p 
gene clusters do in fact utilize interface residues in the EC2 
AB loop region and others in close structural proximity to EC1 
residue 87. 

Domain Shuffiing to Identify Specificity-Determining 
Domains 

Within each of the mouse gene clusters, there exist pairs of Pcdh 
isoforms (Pcdha7 and Pcdha8; Pcdhp6 and Pcdhp8; PcdhyA8 
and PcdhyAQ) with greater than 80% pairwise sequence identity 



within their EC1-EC4 domain regions. Despite this high identity, 
these pairs display strict homophilic specificities (Thu et al., 
2014). In order to help identify the binding interface, we pro- 
duced chimeras in which EC domains were shuffled between 
the closely related isoforms. These proteins were tagged at the 
C terminus with either of the fluorescent proteins mCherry or 
mVenus and tested for binding specificity in the K562 cell assay. 
We confirmed that all three pairs bind strictly homophilically (Fig- 
ure 3A, 1-4; Figure 3B,1-4; Figure 3C, 1-4). 

The results of cell aggregation experiments using different 
chimeric constructs are summarized in Figures 3 and S3. These 
results are presented in such a way that two closely related wild- 
type “parent” proteins appear at the left of each panel, while 
each figure indicates whether a particular chimera co-aggre- 
gates with one or the other parent protein or prefers to aggregate 
homophilically. Figure 3D summarizes the data presented in 
Figures 3A-3C. All chimeric constructs containing EC1-EC3 
domains from one isoform and EC4-EC6 domains from another 
co-aggregated with the wild-type “parent” isoform that con- 
tained the same EC1-EC3 domains (Figures 3A-3C, panel 6, 
and Figures S3B and S3D, panel 13), whereas chimeric con- 
structs with just EC2-EC3 shuffled, preferred to aggregate 
homophilically (Figures S3A-S3E, panels 11 and 12). 

Despite the fact that shuffling EC1-EC3 is sufficient to swap 
specificity in close pairs, our AUC and cell aggregation assay re- 
sults (Table 1 and Figure 2A) indicate that all four N-terminal do- 
mains (EC1-EC4) are required for trans homophilic recognition. 
We therefore generated a chimera of PcdhyA8 in which domains 
EC2-EC4 were replaced with the corresponding domains of the 
closely related PcdhyAQ isoform, while domains EC5-EC6 were 
replaced with the EC5-EC6 domains of the distant PcdhyB6 iso- 
form, which would not be expected to interact in trans with 
PcdhyA8 or PcdhyAQ. Cells expressing this chimera adhere to 
cells expressing PcdhyAQ indicating, consistent with AUC 
data, that the EC4 domain plays a role in determining homophilic 
binding specificity (Figure 3C, panel 8). This conclusion is also 
supported by cell aggregation studies using chimeras where 
EC1 is derived from one parent and EC2-EC6 from another. In 
all cases, these chimeras co-aggregate with the parent contain- 
ing the same EC2-EC6 domains (Figure S3A, S3C, and S3E, 
panel 1; Figures S3B and S3D panel 2). Since domains ECS 
and EC6 are not required for trans binding, these results also 
implicate EC2-EC4 as sufficient to determine homophilic 
specificity. 

The experiments reported in Figure S3 help define the minimal 
number of domains within the EC1-EC4 region that determine 
the binding properties of a chimera. The presence of a single 
domain is never enough to mediate co-aggregation with a parent 
isoform containing this domain (Figure S3A, S3C, and S3E, 
panels 2, 4, and 6; Figures S3B and S3D, panels 1 , 3, and 5), 
but in some cases, a mismatched single domain is capable of 
disrupting binding to the parent isoforms (Figure S3C, panel 5; 
Figure S3D, panel 6; Figure S3E, panel 3). In a few cases, the 
presence of just two domains in common is sufficient to mediate 
co-aggregation with a parent even if the other four domains are 
different. This can be seen in a chimera containing EC1 and EC3 
from yA9 and EC2 and EC4-EC6 from yA8, which co-aggre- 
gates with wild-type yA9 (Figure S3C, panel 10), and a chimera 
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Figure 3. Pcdh trans Binding Depends on the Four N-Terminal Domains EC1-EC4 

(A-C) Domain-shuffled chimeras of closely related isoforms and their wild-type counterparts were assayed for binding specificity. Swapped specificity was noted 
for chimeras in which either the EC1-EC3 or EC2-EC4 domains were replaced with the corresponding domains of closely related isoforms. See also Figure S3. 
(D) Schematic representation of the domain-shuffled isoforms and their observed binding specificities to their wild-type isoform counterparts. 



containing EC1 and EC2 from p8 and EC3-EC6 from p6, which 
co-aggregates with wild-type P8 (Figure S3E, panel 8). Overall, 
these results are consistent with all four N-terminal domains, 
EC1-EC4, contributing to trans binding with the relative contri- 
butions of each domain to specificity varying from one isoform 
to another. 

Rational Design of Point Mutations to Identify 
Specificity-Determining Residues 

Sequence alignment of specificity-determining ECS domains 
shows that Pcdha7 and Pcdha8 differ in five amino acids, 
whereas PcdhyA8 and PcdhyAQ differ in eight (Figure 4A). 
Notably, in both cases, three of these residues are located in 
the same structural element: the FG loop (Figures 4A, 5A, and 
5C). In the case of PcdhyA8 and PcdhyAQ the three variable 
FG loop residues are highly conserved within their respective 
orthologs (Figure 4B). Together, these data strongly suggest 



that these three ECS domain FG loop residues act as specificity 
determinants for a and y Pcdh isoforms. 

To test this hypothesis experimentally, we swapped the three 
residues (Figure 5) between the ECS domains of closely related 
isoforms and tested their binding specificities with their “parent” 
native isoforms. We produced chimeras with the three FG-loop 
residues of one isoform replaced with the corresponding resi- 
dues of its close-pair isoform. These three-residue-swapped 
mutants were tested, along with their native “parents,” in the 
K562 cell aggregation assay. Cells expressing an isoform in 
which the three FG-loop residues were replaced with those 
from the close-pair isoform intermixed with cells expressing 
the wild-type isoform with residues identical to those at the shuf- 
fled positions (Figures 5A and 5C). In contrast, these cells segre- 
gated from cells expressing the wild-type isoform from which the 
EC3 domain originated (Figure S4). We conclude that the three 
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Figure 4. Candidate Specificity Determining Residues 

(A) Multiple sequence alignment of the three closely related Pcdh isoform pairs, along with PcdhyCS. Highlighted in gray are positions conserved in all Pcdh 
sequences. Sequence positions that differ between the closely related isoforms are shown in red; a subset of these residues determines binding specificity. 
Residues swapped between isoforms and assayed for binding properties are boxed. Secondary structure from PcdhyCS is shown at the top of the alignment. 

(B) Multiple sequence alignment of the FG-loop region for PcdhyAS and PcdhyA9 orthologs. Three of the residues that differ between mouse PcdhyAS and 
PcdhyA9 are highly conserved in orthologs (highlighted in red), suggesting their functional importance. 



variable residues of the ECS FG loop are specificity determining 
in the closely related a and y isoforms. 

A similar analysis was carried out for EC1 and EC2 domains 
with comparable results. As with the ECS domains, we analyzed 
close isoform pairs (Figure 4A) and identified candidate speci- 
ficity-determining residues located on the EC1 C strand and 
EC2 AB region (Figure 5). We validated these assignments by 
showing that shuffling residues between EC2 domain AB regions 
resulted in swapped specificities for close-pair isoforms from all 
three Pcdh gene clusters (Figures 5 and S4). Shuffling residues 
between EC1 domain C strand regions was sufficient to swap 
EC1 specificities from Pcdhp6 to that of Pcdhp8 or from Pcdha7 
to PcdhaS. The contribution of this region in the Pcdhy pair could 
not be determined because shuffling of residues in this region 
resulted in a protein that could not mediate cell aggregation (Fig- 
ure S4D). We note that swapping EC1 specificities from Pcdhp6 
to Pcdhp8 or EC2 specificities from Pcdha7 to Pcdha8 or from 
PcdhyAQ to PcdhyA8 required the alteration of only a single 
residue (residue R41 N, L11 4P, and S1 1 4N for p, a, and y respec- 
tively; Figure 5). 



Rational and Random Mutagenesis Identify the Same 
Functional Binding Surfaces 

Figures 2 and 5 list specificity-determining residues identified 
from arginine scanning and bioinformatics-based mutagenesis. 
The finding that two different approaches implicate the same 
structural regions in Pcdh homophilic binding and that these 
regions are in common for isoforms from different Pcdh gene 
clusters indicates that these regions— the EC1 C and G strands, 
the EC2 AB loop, and ECS FG loop (Figure 5D)— are likely to 
contribute to determining the binding specificities for other 
Pcdh isoforms as well. As shown above, EC4 contributes to 
the trans binding specificity in a similar way to that of EC1 . How- 
ever, we focused on the EC1-EC3 domains because this is the 
region for which we have atomic-level structures. 

AUC Experiments on Mutant Proteins Confirm that Pcdh 
trans Interactions Occur via EC1-EC4 Domains, 
whereas cis Interactions Occur via the ECS Domain 
We have provided evidence from both AUC and cell aggregation 
assays that the EC1-EC4 domains mediate Pcdh trans interac- 
tions, whereas the EC6 domain mediates an independent Pcdh 
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Figure 5. Structural Elements of the Canonical Pcdh trans Binding Interface 

(A-C) Assessing specificity-determining residues. Binding properties of wiid-type isoforms (ieft side of each panei) or constructs with shuffled residues (top of 
each panel) were tested separately for each EC domain. Cases in which shuffled residues swapped specificities are indicated by an orange outline. Residues 
shuffled between closely related isoforms are shown in magenta on surface representations of the Pcdha7, Pcdh(36, and PcdhyAS structures. Sequence 
alignments of shuffled regions are shown. See also Figure S4. 

(D) Correspondence between trans interface residues identified by arginine scanning and close-isoform pair analysis. Single arginine mutant residues that abolish 
or diminish homophilic binding, highlighted in red and orange respectively, are found in the same structural regions as the shuffled residues (see also Figure 2). 
Residues that swap binding specificity between closely related isoforms are shown in magenta on surface representations of the Pcdh-yC5 crystal structure. 

c/s interaction. To provide further evidence for these findings, we an arginine at these positions ablates trans binding in cell aggre- 
expressed and purified various domain-truncated constructs of gation assays, these mutant constructs should only affect the 

PcdhyA8-l116R, PcdhyC5-S116R, and PcdhaC2-S118R. Since Pcdh frans-association but not the c/s-association in AUC 
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experiments. As expected, the EC1-EC4 fragment of I116R 
PcdhyAS behaved differently from its wild-type counterpart 
and was monomeric in solution (Table 1). In contrast, we found 
that, similar to its wild-type counterpart, the EC2-EC6 fragment 
PcdhyC5-S1 1 6R behaved as a dimer with Kq similar to wild-type 
EC2-EC6. This observation suggests that the EC2-EC6 protein 
dimerizes in c/s through a region that is not involved in the trans 
interface (Table 1 ). Finally, the complete ectodomain of PcdhaC2 
containing an S118R mutation displayed tetramerization affinity, 
which was an order of magnitude lower than that of the wild-type 
protein. Similarly, the S116R mutant of PcdhyCS EC1-EC6 did 
not form tetramers (as does its wild-type counterpart) but rather, 
similar to the EC2-EC6 fragment, self-associates as a dimer. 
Since trans binding has been ablated by this mutation, the 
observed dimer must correspond to association in c/s (Table 1). 

The trans Homophilic Interface Is Formed via 
Head-to-Tail Interactions of EC1-EC4 Domains 
Computational Docking Yields Antiparailel Orientations 

We carried out modeling studies in an effort to elucidate the 
dimerization mode of Pcdhs. We limited our modeling to EC1- 
EC3, for which we have determined crystal structures and 
have identified specificity-determining residues. We used the 
M-zdock program (Pierce et al., 2005) to produce symmetric ho- 
modimeric models for the EC1 -ECS domain regions of PcdhaC2, 
Pcdhpl, PcdhyAS, and PcdhyC5. We generated thousands of 
models for each crystal structure and used the experimentally 
identified specificity determinant residues to filter the docked 
models; requiring models to include these residues at the binding 
interface. A second constraint required docking models to have a 
buried surface area at the binding interface of more than 1 ,200 A^ 
(600 A^ per protomer). Applying these two conditions reduces the 
number of docked models from thousands to 1 49: 23, 40, 40, and 
46 for PcdhyAS, Pcdhpl , PcdhaC2, and PcdhyC5, respectively. 
We then structurally clustered the filtered docked homodimers 
with the expectation that there would be more docked structures 
near the native conformation. 

Notably, the majority of the filtered docked homodimeric 
Pcdhs (62.5%) adopted a head-to-tail orientation of the two mol- 
ecules in which the EC2 domain of one molecule interacts with 
the EC3 domain of its partner (Figures 6A and S5A, i and ii). 
Furthermore, most structures with this binding mode place the 
EC1 domain of one molecule adjacent to the expected position 
of the EC4 domain of its partner (Figure 6A). Only three of the 
docked and filtered complexes had a head-to-head orientation 
(two for PcdhyC5 and one for PcdhaC2; Figure S5A, iii), whereas 
filtered solutions for Pcdhpl and PcdhyAS resulted solely in so- 
lutions with a head-to-tail orientation. We note that it is the appli- 
cation of the two constraints, one of which was experimentally 
derived, that results in this distribution of binding modes. 
Experimental Validation of a Head-to-Taii Orientation 
The computational evidence for a head-to-tail dimer, taken 
together with our identification of EC1-EC4 as the specificity- 
determining region, suggests that EC1 interacts with EC4 and 
EC2 interacts with ECS. In order to validate this model, we carried 
out cell aggregation assays on chimeras of the yA8 and yA9 Pcdh 
isoforms, which were designed to determine which domains 
physically interact. As shown in the schematic, diagrams in Fig- 



ure 6B (panels 1-3), head-to-tail binding would result in a dimer 
where all EC2/EC3 and EC1/EC4 interactions involve domains 
from the same wild-type protein. In all three cases, the chimeras 
form mixed aggregates, thus providing strong evidence for our 
proposed model of the Pcdh-Pcdh interface. Note that, if the 
monomers bound in a head-to-head orientation, some interact- 
ing domains would be derived from different wild-type proteins 
so that mixed aggregates would not be expected to form. 

Figure 6B (panels 4 and 5) provides direct evidence that EC1 
interacts with EC4 and EC2 interacts with EC3. Comparing panel 
4 to panel 1 , the only difference between the two is that there is a 
mismatch between EC4 and EC1 in panel 4. The two cell popu- 
lations in panel 4 form separate aggregates, indicating that this 
single mismatch is sufficient to ablate trans dimerization. An 
identical conclusion regarding EC2 and EC3 is reached by com- 
parison of panel 5 to panel 2. Here again, a single-domain 
mismatch inhibits co-aggregation even though the remaining 
three domains are correctly matched. 

To further validate the model of head-to-tail binding, we car- 
ried out mutagenesis experiments on specificity-determining 
regions. Since, as shown above, for the a and y close pairs the 
EC2 AB loop and the EC3 FG loop determine specificities, we 
reasoned that the specificity-determining residues in the EC2 
AB loop might interact with corresponding residues in the EC3 
FG loop. Notably, the largest cluster of structurally similar 
docked and filtered complexes is the only cluster that positions 
the EC2 AB loop near the EC3 FG loop and projected to position 
the EC1 near EC4 (Figures 6A and S5A). To test this model (Fig- 
ure 6A), we relied on two observations (1) that arginine mutations 
of residue 301 in the EC3 FG loop region and residue 1 16 in the 
EC2 AB loop region (PcdhyC5 numbering) abrogate recognition 
in isoforms from different gene clusters (Figures 2B-2D) and (2) 
that docked models position residue 301 and residue 116 at 
close distance (less than 6A, Figure 6A). Hypothesizing that 
residues 1 1 6 and 301 are near each other in the recognition com- 
plex, we attempted to rescue single-arginine mutants at residue 
303 of PcdhaC2 or 298 of PcdhyA8 and Pcdhp6 (analogous 
to PcdhyC5 301) by producing an aspartic acid mutation of 
PcdhaC2 residue 118, of PcdhyA8 residue 116 or of Pcdhp6 
residue 117 (analogous to PcdhyC5 116). The designed double 
mutants could, in principle, form a salt bridge at the interface 
and thus might rescue recognition. 

For all three isoforms (PcdhaC2, Pcdhp6, and PcdhyA8), cells 
expressing the double arginine/aspartic-acid mutants tested 
positive for cell aggregation (Figure 6C), indicating that these 
two mutated residues (1 16 and 301), located respectively on do- 
mains EC2 and EC3, are in close proximity at the homophilic 
binding interface. This observation provides strong support for 
a head-to-tail binding mode where EC2 interacts with EC3 and 
where EC1 interacts with EC4. Moreover, since PcdhaC2, 
Pcdhp6, and PcdhyA8 are not closely related, it is likely that 
the modeled interface represents the recognition interface for 
other Pcdhs as well. 

DISCUSSION 

Counterintuitively, the phenomenon of neuronal self-avoidance is 
initiated by trans homophilic adhesive binding between Pcdhs. 
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Figure 6. Molecular Logic of Pcdh-Mediated Cell-Cell Recognition 

(A) Shown in ribbon representation is the only orientation observed for docking of the four EC1 -ECS domains structures, which position the EC2 AB loop in close 
proximity to the ECS FG loop. EC2 AB loop residue 116 and FG loop residue SOI are drawn as space filling and colored red and blue, respectively. The vast 
majority of the docked complexes were observed to interact in this mode. See also Figure S5A. 

(B) Cell aggregation assays on chimeric proteins that show EC1 interacts with EC4 and EC2 interacts with ECS. Schematic representation of the head-to-tail 
interaction between the domain-shuffled chimeras is shown above each panel. Mixed aggregates were formed where all interactions involve “matching” domains 
(panels 1-S). Separate aggregates were formed when there is a mismatch between EC1/EC4 (panel 4) or between EC2/ECS (panel 5). 

(legend continued on next page) 
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Presumably, repulsion is a consequence of the activation of 
downstream signals via the ICD, which is known to interact with 
signaling adaptors and kinases (Han et al., 2010; Schalm et al., 
2010). This mechanism requires that different neurons express 
a sufficiently distinct set of Pcdh isoforms so that inappropriate 
“self’-recognition, and subsequent repulsion, will not occur. In 
the case of invertebrates, this is accomplished through the sto- 
chastic expression of about 10-50 different alternatively spliced 
Dscam isoforms in each cell (Hattori et al., 2008; Zipursky and 
Grueber, 2013; Zipursky and Sanes, 2010). With thousands of 
stochastically generated distinct Dscam isoforms, the probability 
that two different neurons express the same set of isoforms is 
extremely low (Miura et al., 201 3). Considering the much smaller 
number of distinct Pcdh isoforms in vertebrates, isoform diversity 
alone cannot account for “non-self-discrimination.” 

As mentioned above, we have shown previously that an inter- 
ference phenomenon plays a crucial role in Pcdh-based non- 
self-discrimination (Thu et al., 2014). In this paper, we present 
evidence from several independent sources of data that suggest 
that Pcdh cell-cell recognition is mediated by a mechanism that 
couples c/s and trans interactions. Specifically, we propose that 
Pcdh isoforms form promiscuous EC6 dependent c/s-dimers 
at the cell surface that associate specifically in trans via a stereo- 
typed interface with elements in domains EC1-EC4. Below, we 
summarize our findings and discuss their implications for the 
molecular mechanisms by which clustered Pcdhs mediate 
neuronal self-recognition and non-self-discrimination. 

Pcdh Hemophilic Specificity Is Determined by a Head- 
to-Tail trans Recognition Interface 

We found that Pcdh EC1-EC3 fragments do not associate in so- 
lution, nor do they mediate homophilic cell-cell recognition in cell 
aggregation assays. Rather, we showed both in AUC measure- 
ments and cell assays that stable trans dimerization requires all 
four of the N-terminal EC1-EC4 domains. Site-directed arginine 
scanning mutagenesis and rational mutagenesis based on anal- 
ysis of sequence alignments allowed us to identify key structural 
elements in a trans interface that mediate cell-cell recognition 
between Pcdhs. 

The identification of interfacial regions in EC2 and EC3 through 
computational modeling and mutagenesis experiments pro- 
vided strong constraints that made it possible to demonstrate 
that Pcdh trans dimers adopt a head-to-tail orientation where 
EC2 interacts with EC3. This remarkable anti-parallel frans-inter- 



action is in contrast to the parallel trans dimerization of classical 
cadherins. However, for classical cadherins, the parallel binding 
mode is made possible by a significant intramolecular bend 
whereby the five EC domains form a highly curved structure so 
that interacting membrane-distal EC1 domains from apposed 
cells are parallel to one another. In contrast, since the EC1- 
EC3 domains in Pcdhs are straight rather than curved, binding 
in parallel would require a sharp bend between the three N-ter- 
minal and three C-terminal domains. Such a bend has been 
observed only in cadherins lacking inter-domain calcium binding 
sites (e.g., DN cadherin [Jin et al., 2012]), and the presence of 
complete calcium binding sites between all domains renders 
such significant bending highly unlikely in the case of Pcdhs. 

Figure 6A shows the structure of an EC1-EC3 trans dimer ob- 
tained from our docking studies that satisfies all the constraints 
established by mutagenesis. The EC4 domain is represented as 
an ellipse in the diagram since its structure has not yet been 
determined. In addition to satisfying all the mutagenesis data 
used as constraints in the docking studies, independent evi- 
dence supporting the model includes (1 ) the set of five cell aggre- 
gation studies on yA8 and yA9 chimeras (Figure 6B) that show 
that EC1 interacts with EC4 and EC2 interacts with EC3 and (2) 
the rescue experiments shown in Figure 6C that reveal that res- 
idue 116 in EC2 is in close proximity to residue 301 in EC3, as 
predicted by the head-to-tail model (Figure 6A). 

The head-to-tail model shown in the figure provides a clear 
explanation of the binding affinity and cell aggregation data. In 
the model, the free energy of binding is distributed over all four 
domain-domain interfaces, and all must be present to generate 
sufficient affinity to produce a stable homodimer. This is evident 
from the observations that three domain constructs do not 
dimerize and that interfacial mutations in only a single domain 
are sufficient to ablate binding. All EC1-EC3 ectodomain frag- 
ments studied here were monomeric, and none revealed a likely 
trans interaction. With a head-to-tail orientation, deletion of only 
one domain in EC1-EC4 effectively removes half the interface, 
providing a likely explanation for the absence of native dimer 
interactions. 

We note that the structural model itself is unlikely to be accurate 
in detail and will certainly be superseded once X-ray structures of 
all four interacting domains are available. The major significance of 
the model is the demonstration that Pcdhs dimerize in trans in a 
head-to-tail orientation with an extended interface formed from 
four inter-domain interfaces (two EC2/EC3 and two EC1/EC4). 



(C) The EC2 domain AB region recognizes the ECS domain FG ioop. Ceiis expressing isoforms with singie arginine mutants in the ECS FG ioop region or with 
doubie mutations (aspartate at the AB region and arginine at the FG ioop) were assayed for aggregation. The doubie mutation rescued the non-adhesive 
phenotype, supporting the head-to-taii binding orientation shown in part (A). 

(D) Two possibie modeis of Pcdh interaction. A discrete tetramer composed of a dimer of dimers is observed in anaiyticai uitracentrifugation, but we suggest that 
a connected ribbon of moiecuies can form between ceiis via the trans and c/s interactions. 

(E and F) A modei for Pedh-mediated ceii-ceii recognition based on formation of a superstructure defined by promiscuous c/s and specific trans interactions. 
Growth of the chain of moiecuies requires matching of aii isoforms; a singie mismatch can terminate chain extension. Dendrites of the same neuron wiil have the 
same isoform repertoire, whereas dendrites of different neurons wiii differ, in this modei, repuision signaiing is triggered, or achieves a sufficient ievei for response, 
oniy through the formation of an extended chain of Pcdhs. 

(G) For the case of 15 distinct Pcdh isoforms expressed per ceii, Monte-Cario simuiations were used to estimate the average size of one-dimensionai Pcdh 
assembiies between contacting ceiis. The average number of c/s dimers that comprise such assembiies is shown on a iogarithmic scaie as a function of the 
number of mismatched isoforms. Two cases are shown— one for 15,000 totai Pcdh monomers (1 ,000 per isoform, red), and one for 1 ,500 totai copies (100 per 
isoform). The modei assumes that each ceii contains a stabie set of c/s dimers formed from the random association of monomers present in each ceii. See aiso 
Figure S5B. 



Cell 163, 629-642, October 22, 2015 ©2015 Elsevier Inc. 639 




Cell 



We note that the molecular dimerization logic of Pcdhs, where 
different domains recognize one another through EC1/EC4 and 
EC2/EC3 trans interactions, is fundamentally different from that 
of Dscami , where the dimerization interface is formed from three 
separate self-self-interactions, Ig2/lg2, Ig3/lg3, and Ig7/lg7. 

Pcdhs Form c/s Dimers Mediated by EC6 

We previously provided evidence for promiscuous Pcdh EC6/ 
EC6 c/s interactions. Specifically, any single carrier isoform (P, 
y, or C-type) can mediate cell-surface delivery of a isoforms, 
which are otherwise confined within the cell, through interactions 
involving the EC6 domain (Thu et al., 2014). In addition, the pair- 
wise sequence identity between EC6 domains for all isoforms of 
Pcdhp or Pcdhy clusters averages over 90% (Thu et al., 2014), 
which is consistent with the idea of promiscuous interactions. 

We show above that the EC6 domain mediates Pcdh c/s 
dimerization even in the absence oUrans interactions. Moreover, 
as shown in Table 1 , the affinity of this interaction is comparable 
or even stronger than the trans interaction involving EC1-EC4. In 
general, c/s interactions in the two-dimensional environment of 
the plasma membrane would be significantly enhanced, and 
the effect is strongest for membrane proximal domains, as there 
would be little entropy loss due to inter-domain flexibility upon 
binding (Wu et al., 201 1 , 2013). Indeed, even at low surface den- 
sities, molecules with substantial solution (3D) KpS, such as that 
of Pcdhs, will likely form dimers on cell surfaces. The promiscuity 
of the EC6 carrier function suggests that these dimers can form 
between essentially any two Pcdh isoforms, which in turn sug- 
gests that Pcdhs on cell surfaces exist as c/s dimers formed by 
pairs of different isoforms from all three subfamilies as well the 
C-type isoforms. 

Assembly Termination by Mismatched Isoforms 
Distinguishes Self from Non-self 

We have shown above that full-length Pcdh ectodomains in so- 
lution form tetramers (a cis/trans dimer of dimers) mediated by 
head-to-tail trans interactions involving EC1-EC4 and a c/s inter- 
action involving EC6. A schematic of this molecular arrangement 
is shown in the left panel of Figure 6D. If Pcdhs on cell surfaces 
interacted in this manner, cellular recognition would be based on 
dimeric recognition units. However, as we have discussed in a 
previous study, dimeric recognition units are unlikely to provide 
sufficient diversity for neuronal non-self-discrimination, and 
indeed all models based on multimeric recognition units 
encounter difficulties in accounting for both self-recognition 
and non-self-discrimination (Thu et al., 2014). For this reason, 
we previously proposed an alternative recognition mechanism 
based on “junction-like” molecular assemblies at least partially 
reminiscent of those formed by classical cadherins. 

As discussed above, each Pcdh molecule forms strong inde- 
pendent trans and c/s interactions. This is in contrast to classical 
cadherins in which each molecule forms relatively strong trans in- 
teractions and two weak asymmetrical c/s interactions that 
become stronger on cell surfaces only once the trans interactions 
have been formed (Wu et al., 2011). In the case of classical cad- 
herins, the combination of c/s and trans interactions generates a 
two-dimensional lattice that corresponds to the extracellular 
structure of adherens junctions (Harrison et al., 201 1 ). In contrast. 



the interactions defined here for Pcdhs suggest the formation of a 
one-dimensional zipper-like structure involving symmetrical c/s 
and trans interactions. This structure is depicted in the right panel 
of Figure 6D, which shows how each bivalent Pcdh c/s dimer 
could recognize two other dimers via independent trans interac- 
tions so as to form a connected ribbon of molecules that emanate 
from two apposed cell surfaces. We note that still-undiscovered 
extracellular, trans-membrane, or cytoplasmic interactions may 
ultimately reveal a more complex network of interactions than 
the one depicted in the figure. For example, the receptor tyrosine 
kinase Ret has been shown to associate with, and directly or indi- 
rectly phosphorylate, Pcdha and y tyrosine residues in their ICDs 
(Schalm et al., 2010). In any case, the existence of even a one- 
dimensional network would provide a mechanism for interfer- 
ence that does not encounter the problems based on models of 
isolated multimeric recognition units. 

Figure 6E illustrates that cells with the same isoform composi- 
tion would be able to form a large assembly upon contact. In 
contrast, cells with different isoform compositions would incor- 
porate mismatches, preventing further growth of the lattice 
(Figure 6F). If downstream signaling leading to neurite repulsion 
depends on the size of the assembly, which in turn depends on 
isoform composition, the model offers a natural mechanism for 
Pcdh interference. Indeed, there is a striking dependency of the 
size of Pcdh assemblies on the number of mismatched Pcdh iso- 
forms. Figure 6G plots the average size of such linear assemblies 
as a function of the number of mismatched isoforms between two 
contacting neurons. Assembly size is obtained from Monte-Carlo 
calculations based on a model that assumes that each cell 
contains a stable set of c/s dimers formed from the random asso- 
ciation of monomers present in each cell. When all isoforms are 
identical, assembly size is limited solely by the number of copies 
of each isoform. Remarkably, the presence of even a single 
mismatched isoform is sufficient to reduce the average size of 
an assembly by at least two orders of magnitude. The results pre- 
sented in Figure 6G thus suggest that a mechanism based on 
mismatched-isoform chain termination of a linear Pcdh-assem- 
bly could provide a binary definition of self and non-self. 

While we recognize that this isoform mismatch chain-termina- 
tion model is speculative, it is consistent with the presence of 
strong independent c/s and trans interactions. Such signaling 
systems have been observed previously, including the one- 
dimensional network of CTI_A-4/B7 immune receptors (Schwartz 
et al., 2001), where signaling has also been proposed to be 
based on large cell-surface assemblies. Most importantly, the 
model provides a mechanism whereby 58 Pcdhs can generate 
the high level of diversity sufficient to allow for neuronal self- 
avoidance without encountering the problems for self-recogni- 
tion, which is implicit in previous models that depend on discrete 
combinatorial multimeric recognition units. 

EXPERIMENTAL PROCEDURES 
Protein Production and Crystaiiography 

Proteins for crystallization or biophysical analysis were expressed in suspen- 
sion-adapted HEK293 Freestyle cells (Invitrogen) and purified by nickel affinity 
and size exclusion chromatography. Pcdh crystals were grown by vapor diffu- 
sion in 1-2 [ l \ hanging drops, except the Pcdhpi EC1-EC3 crystals, which 
were grown in 0.2 |il sitting drops. The PcdhyCS EC1-EC3 P4s2i2 crystal 
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structure was solved using the MIRAS technique, while all the other Pcdh crys- 
tal structures were solved by molecular replacement. See the Supplemental 
Experimental Procedures for details. 

Cell Aggregation Assays 

Pcdh expression constructs were transfected into K562 cells by electropora- 
tion. The transfected cells were grown in culture for 24 hr. Cells were then 
allowed to aggregate for 1 to 3 hr on a rocker inside an incubator at 37°C. 
The cells were then fixed in 4% PFA for 10 min, washed in PBS, and cleared 
with 50% glycerol for imaging. See the Supplemental Experimental Proce- 
dures for details. 

Sedimentation Equilibrium Analytical Ultracentrifugation 

Proteins were diluted to an absorbance at 10 mm path length and 280 nm of 
0.65, 0.43, and 0.23 absorbance units. All samples were run at four speeds: 
11,000, 14,000, 17,000, and 20,000 rpm (all EC1-EC3 constructs) or 9,000, 
11,000, 13,000 and 15,000 rpm (all EC1-EC4, EC1-EC5, and EC1-EC6 con- 
structs), respectively. Measurements were carried out at 25°C and detection 
was by UV at 280 nm. 

Monte-Carlo Simulations 

A stochastic algorithm was used to estimate the average size of Pcdh assem- 
blies (number of linked c/s dimers) formed between a pair of neurons each ex- 
pressing 15 distinct isoforms with 0-15 common isoforms. It was assumed 
that a neuron expresses an equal number of copies of each of the 1 5 Pcdh iso- 
forms, with either 1,000 or 100 copies per isoform (i.e., 15,000 or 1,500 total 
Pcdh monomers respectively). 10® simulations were performed, and in each 
simulation, stable c/s dimers were randomly and independently generated 
for the contacting neurons. Note that the distribution of c/s dimers on both neu- 
rons will not in general be identical even for neurons with an identical set of 
monomers. A linear network was initiated by randomly choosing a dimer on 
one of the cells. In the next step, a c/s dimer is chosen on the second cell where 
one of its monomer constituents matches one of the monomers in the dimer 
chosen on the first cell. This matching process is then repeated with the search 
for matching dimers alternating between the contacting neurons moving from 
one cell to the other as the chain extends in two directions. This extension pro- 
cess was repeated until there remained no matching dimers either due to a 
mismatch or to a depletion of dimers. 
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SUMMARY 

Thermogenic brown and beige adipose tissues dissi- 
pate chemical energy as heat, and their thermogenic 
activities can combat obesity and diabetes. Herein 
the functional adaptations to cold of brown and beige 
adipose depots are examined using quantitative 
mitochondrial proteomics. We identify arginine/crea- 
tine metabolism as a beige adipose signature and 
demonstrate that creatine enhances respiration in 
beige-fat mitochondria when ADP is limiting. In mu- 
rine beige fat, cold exposure stimulates mitochon- 
drial creatine kinase activity and induces coordinated 
expression of genes associated with creatine meta- 
bolism. Pharmacological reduction of creatine levels 
decreases whole-body energy expenditure after 
administration of a 33-agonist and reduces beige 
and brown adipose metabolic rate. Genes of crea- 
tine metabolism are compensatorily induced when 
UCP1 -dependent thermogenesis is ablated, and cre- 
atine reduction in L/cp 7 -deficient mice reduces core 
body temperature. These findings link a futile cycle 
of creatine metabolism to adipose tissue energy 
expenditure and thermal homeostasis. 

INTRODUCTION 

Non-shivering thermogenesis primarily takes place in brown 
and beige adipose tissues. The ability of these depots to dissi- 
pate chemical energy has led to interest in their ability to combat 
obesity and diabetes. The thermogenic property of brown and 
beige fat relies predominantly on the actions of uncoupling 
protein 1 (UCP1) (Cannon and Nedergaard, 2004). This protein 
resides in the mitochondrial inner membrane and stimulates 
thermogenesis by dissipating the protonmotive force (Ap) and 
increasing the rate of substrate flux through the mitochondrial 
respiratory chain. 

It is now appreciated that there are at least two distinct UCP1 - 
expressing cell types. Classical brown adipocytes are derived 
from a Myf5'^ lineage and are located primarily in developmen- 
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tally formed depots in the interscapular region of rodents and hu- 
man infants (Seale et al., 2008). Beige-fat cells arise primarily 
from a Myf5~ lineage and generally accumulate in white fat de- 
pots upon cold challenge or with the application of a number 
of different adrenergic stimuli, hormones, and peptide factors. 
Much (but probably not all) of the “brown fat” of adult humans 
in the neck and supraclavicular areas has the molecular charac- 
teristics of beige fat, rather than those of the classical brown fat 
of rodents (Shinoda et al., 2015; Wu et al., 2012). On the other 
hand, the interscapular depots of human infants do indeed 
resemble the classical brown fat of rodents (Udell et al., 2013). 

Ablation experiments in mice have shown that UCPI"^ cells, 
taken as a whole, protect mice from the metabolic effects of 
high-fat feeding. Diminution of UCPI^ cells via transgenic 
expression of a toxigene first established the key role of these 
cells in the regulation of metabolic health (Lowell et al., 1993). 
Mice with Ucp1 deletion also develop obesity, although in this 
case obesity is only observed at thermoneutrality (Feldmann 
et al., 2009). Recently, beige-fat function has been ablated in 
mice, with classical brown-fat function left largely intact (Cohen 
et al., 201 4); these animals develop moderate obesity and insulin 
resistance centered on the liver. 

The realization that mammals have two distinct thermogenic 
cell types raises questions regarding their similarities and their 
differences. Questions concerning fuel preferences, hormone 
sensitivities, and other key thermogenic pathways and functions 
are largely unexplored. We have performed quantitative prote- 
omics, comparing highly purified mitochondria from brown and 
beige-fat depots. The results indicate that beige-fat cells have 
a thermogenic mechanism built around a creatine-driven sub- 
strate cycle. 

RESULTS 

Mitochondrial Purification from Cold-Exposed Brown 
and Beige Fat 

We set out to compare the proteomic and bioenergetic proper- 
ties of mitochondria isolated from beige and brown adipose tis- 
sue upon induction of thermogenesis through cold exposure. To 
this end, we exposed mice to 4°C, which is sufficient to drive 
thermogenesis in subcutaneous inguinal white adipose tissue 
(iWAT) and classical interscapular brown adipose tissue (BAT). 
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We used western blotting to evaluate the purity of our mitochon- 
drial preparations. As expected, mitochondrial proteins were 
enriched, whereas contaminating components of the cytoplasm 
and endoplasmic reticulum were largely removed during the 
purification procedure (Figures S1A and S1B). Mitochondrial 
yield increased substantially (beige: 10-fold and brown: 2-fold) 
following cold exposure (Figure SIC). These mitochondria from 
both sources displayed properties indicative of UCPI"" organ- 
elles, including the requirement for BSA and purine nucleotides 
to acquire respiratory control (Figure SID). These data are in 
line with previous reports (Shabalina et al., 2013) and indicate 
that beige-fat mitochondria from iWAT are functionally thermo- 
genic following cold exposure. 

Quantitative Mitochondrial Proteomics Identifies 
Arginine/Creatine Metabolism as a Signature of Beige 
Adipose Tissue 

We used tandem mass spectrometry after isobaric peptide 
tagging to identify protein species exhibiting differential abun- 
dance between beige and brown adipose mitochondria from 
two strains (C57BL/6 and 129SVE) of cold-exposed mice (Fig- 
ure 1A and Table SI). As shown by the global heatmap (Fig- 
ure IB) and the principal-component analysis (Figure 1C), 
brown-fat mitochondria showed marginal strain-dependent vari- 
ance, whereas beige-fat mitochondria demonstrated greater di- 
versity across strains. Next, beige and brown-fat mitochondrial 
proteomes were stratified according to differential relative abun- 
dance (Figure IB), followed by identification of beige-fat-selec- 
tive biological pathways (Figure ID). Because mitochondrial 
proteins made up a larger percentage of the material after 
isolating pure organelles through a sucrose gradient, relative to 
crude mitochondria obtained by differential centrifugation alone 
(Figures SI A and SIB), we defined bona fide mitochondrial 
proteins to be those with higher abundance in the pure fraction, 
relative to the crude fraction. Thus, protein abundance between 
crude and pure preparations of mitochondria from beige and 
brown fat was examined by mass spectrometry (Figure IE and 
Table S2). Proteins that had higher abundance in pure mitochon- 
dria, relative to crude mitochondria, were cross-referenced to 
the initial proteomics inventory (Table SI). Pathway analysis 
demonstrated that components of arginine-dependent creatine 
and proline metabolism were reproducibly enriched in beige- 
fat relative to brown-fat mitochondria (Figures 1 D and 1 F). A total 
of 14 proteins were identified that could be assigned to this 
pathway (Table S3). Enzymes with the ability to synthesize 
creatine and remove ornithine, an inhibitor creatine biosynthesis 
(Sipila, 1980), showed beige-fat selectivity (Figure 1G). There 
was a strong correlation between western blotting and proteo- 
mic quantification for beige- and brown-enriched mitochon- 
drial proteins (Figures SI E-SI G). Increased protein abundance 
in beige-fat mitochondria was observed for components of argi- 
nine-dependent creatine and proline metabolism, such as GATM 
and CKMT2, as well as the majority of ATP synthase subunits 
(Figure 1H and Table SI). Beige-fat mitochondria contained 
higher levels of creatine kinase (CK) activity relative to brown- 
fat mitochondria in both C57BL/6 and 129SVE strains of mice 
following cold exposure (Figure S1H). Moreover, mitochondrial 
CK activity was cold inducible (~2-fold) in beige fat (Figure 1 1). 



Given that creatine metabolism was found to be a distinct 
feature of thermogenic beige adipocytes at the protein level, 
we monitored changes in the mRNA expression of genes 
involved in creatine metabolism following 6 hr and 1 week expo- 
sure to 4°C. Transcript levels of these genes were coordinately 
elevated in response to cold in iWAT but not in BAT (Figure 1J). 
Gatm and Ckmt1 transcript abundance was similar between 
iWAT and BAT. However, GATM and CKMT1 protein expression 
was higher in beige-fat than in brown-fat mitochondria (Table 
S3). In contrast, Ckmt2 transcript levels were greater in BAT 
compared to iWAT. However, the expression level of CKMT2 
protein was found to be slightly greater in beige-fat than in 
brown-fat mitochondria (Figure 1H and Table S3). The discor- 
dance between Ckmt2 mRNA from whole-tissue lysates and 
protein abundance from isolated mitochondria is likely due to 
higher mitochondrial content in BAT than iWAT. 

We next investigated the levels of creatine and phosphocrea- 
tine (PCr) in iWAT and BAT from mice housed at 30°C or 4°C. 
Creatine levels in iWAT were elevated 2-fold following cold 
exposure (Figure SI I). In contrast, although higher steady-state 
creatine levels were observed in BAT, cold exposure had no 
detectable effect (Figure SI I). There was no difference in PCr 
levels in either iWAT or BAT in response to cold (Figure S1J), 
although a modest trend toward lower PCr levels in BAT was 
observed; these observations are in line with a recent report 
(Grimpo et al., 201 4). As a consequence of these measurements, 
it is clear that the PCr/creatine ratio in iWAT was reduced 
significantly in cold-exposed animals (Figure IK), suggesting 
increased creatine metabolism in beige fat. These changes in 
creatine levels were not observed in skeletal muscle of the 
same animals (Figure SI K). 

Creatine Stimulates Respiration in Beige-Fat 
Mitochondria when ADP Is Limiting 

Based on our identification of creatine metabolism as a signa- 
ture of beige-fat mitochondria and due to the functional 
coupling of mitochondrial CK (Mi-CK) to oxidative phosphoryla- 
tion through the ATP/ADP carrier (AAC) (Jacobus and Leh- 
ninger, 1973; Wyss and Kaddurah-Daouk, 2000), we posited 
that creatine could dissipate the mitochondrial ATP pool to 
drive ADP-dependent respiration in beige-fat mitochondria. 
Such a pathway would require creatine and CK-mediated hy- 
drolysis of ATP to drive a catalytic mechanism that stimulates 
cycling of ATP production and consumption (Figure 2A). We 
therefore tested whether creatine could stimulate substrate 
cycling and increase ADP-dependent respiration in beige-fat 
mitochondria. 

Striated muscle tissues are understood to utilize creatine 
metabolism such that mitochondrial ATP and creatine generate 
PCr and ADP in a 1:1 stoichiometry (Wyss and Kaddurah- 
Daouk, 2000). The resulting PCr pool is used to drive sub- 
strate-level phosphorylation of ADP during times of ATP deficit. 
Thus, a direct prediction of this classical metabolic utilization 
of PCr is that addition of a quantity of creatine to oxidatively 
coupled mitochondria will result in a molar equivalent produc- 
tion of ADP and PCr through CK-mediated phosphotrans- 
ferase activity (Jacobus and Lehninger, 1973). Alternatively, 
if creatine drives futile substrate cycling, addition of a given 
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Figure 1. Characterization of Mitochondria from Brown and Beige Adipose Tissues 

(A) Schematic of mitochondrial purification and quantitative proteomics workflow. 

(B) Heatmap of beige-fat and brown-fat mitochondrial proteomics data (from Table S1). Beige-enriched proteins are shown in the subset heatmap. 

(C) Principal-component analysis of the mitochondrial proteomics dataset. 

(D) Kegg pathway analysis of beige-fat-selective mitochondrial proteins from Figure 1B. 

(E) Heatmap of beige-fat and brown-fat mitochondrial proteomics data (from Table S2) after selecting proteins on the basis of an expression ratio greater than 1 in 
pure/crude mitochondria. 

(F) Kegg pathway analysis of significantly enriched beige-fat mitochondrial proteins after cross-referencing Tables S1 and S2. 

(G) Schematic of creatine synthesis and byproduct removal proteins identified by mass spectrometry. Red circles, proteins identified by mass spectrometry; gray 
circles, proteins not identified. Gly, glycine; Arg, arginine; Met, methionine; PCr, phosphocreatine; P5C, 1-pyrroline-5-carboxylic acid; Mi-CK, mitochondrial CK. 

(H) Western blot after treatment of beige- and brown-fat mitochondria with trypsin (0, 10, 25, 50, and 100 |rg ml“^). 

(I) CK activity of mitochondria from 129SVE mice housed at 30°C or 4°C for 7 days. 

(J) Quantitative RT-PCR (qRT-PCR) from C57BL/6 mice housed at 30°C or 4°C for 6 hr (4°C— 6 hr) or 7 days (4°C— 7 days); n = 3 to 4 mice per group. 

(K) PCr to creatine (Cr) ratio in iWAT and BAT from 129SVE mice housed at 30°C or 4°C for 7 days. 

Data are presented as means ± SEM. *p < 0.05, ***p < 0.01 . 
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Figure 2. Creatine Stimulates Respiration in Beige-Fat Mitochondria when ADP Is Limiting 

(A) Model of creatine-based substrate cycling. IMS, intermembrane space; AAC, ADP/ATP carrier. 

(B) Western blot of mitochondrial proteins, n = 2 mitochondrial preparations, 15 mice per cohort. 

(C) Oxygen consumption by beige-fat mitochondria treated with and without creatine (0.01 mM) in the presence of 0.2 mM ADP. Vertical dashed line, state 3 to 
state 4 transition, n = 9 mitochondrial preparations, 15 mice per cohort. 

(D) State 3 and ADP-limiting oxygen consumption rate (OCR) of beige-fat mitochondria treated with and without creatine in the presence of 0.2 mM ADP. n = 9 
mitochondrial preparations, 15 mice per cohort. 

(E) State 3 and ADP-limiting OCR of brown-fat mitochondria treated as in (D). n = 3 mitochondrial preparations, 15 mice per cohort. 

(F) State 3 and ADP-limiting OCR of heart mitochondria treated as in (D). n = 3 mitochondrial preparations, 8 mice per cohort. 

(G) State 3 and ADP-limiting OCR of kidney mitochondria treated as in (D). n = 3 mitochondrial preparations, 15 mice per cohort. 

(H) State 3 and ADP-limiting OCR of liver mitochondria treated as in (D). n = 3 mitochondrial preparations, 2 mice per cohort. 

(I) Western blot of mitochondrial proteins from beige fat, brown fat, and heart, n = 2 mitochondrial preparations, 15 mice per cohort. 

(J) Mitochondrial creatine concentration in beige fat, brown fat, and heart. 

Data are presented as means ± SEM. *p < 0.05. 



amount of creatine would instead drive the release of a molar 
excess of ADP with respect to creatine, thus resulting in a 
surplus of oxygen consumption if ADP is limiting. In this 
case, the relationship of creatine to ADP liberation would be 
substoichiometric. 

The stoichiometric relationship between creatine and ATP 
synthase-coupled respiration was examined in mitochondria 
isolated from a panel of tissues. Western blotting demonstrated 



similar mitochondrial yields from brown fat, beige fat, heart, kid- 
ney, and liver (Figure 2B). Mitochondria were respired on pyru- 
vate and malate in the presence of varying ADP concentrations. 
These organelles exhibited the expected behavior on addition of 
sub-saturating amounts of ADP (0.01-0.2 mM), such that respi- 
ration transitioned to state 4 as ADP became limiting (Figures 
S2A and S2B). In contrast, a saturating amount of ADP (1 mM) 
resulted in state 3 respiration with no transition to state 4, as 



646 Cell 163 , 643-655, October 22, 2015 ©2015 Elsevier Inc. 





Cell 



ADP was no longer limiting over the course of the incubation 
(Figure S2A). 

In the presence of sub-saturating levels of ADP, addition of a 
sub-stoichiometric amount of creatine (0.01 mM) stimulated an 
~30% increase in respiration in beige-fat mitochondria. Spe- 
cifically, creatine did not significantly affect the initial (state 3) 
ADP-stimulated rate of oxygen consumption (Figures 2C and 
2D, left panel) but enhanced the respiration rate when ADP levels 
became limiting (Figures 2C and 2D, right panel). This action of 
creatine in beige-fat mitochondria did not occur in the absence 
of exogenous ADP (Figure S2C) nor did it occur in the presence 
of the AAC inhibitor, carboxyatractyloside (Figure S2D). These 
data suggest that creatine-mediated respiration required export 
of mitochondrial matrix ATP through the AAC upon addition of 
exogenous ADP (Figure 2A). This feature of creatine-dependent 
respiration is in agreement with the established functional 
coupling of AAC with Mi-CK isoforms (Jacobus and Lehninger, 
1973; Wyss and Kaddurah-Daouk, 2000). Furthermore, incuba- 
tion of beige-fat mitochondria with saturating ADP concentra- 
tions (Figure S2E) precluded the respiration-stimulating effects 
of creatine (Figure S2F), further confirming that the stimulatory 
effects occurred only under ADP-limiting conditions. 

On the basis of the mitochondrial P/0 ratio (Watt et al., 2010) 
we calculated the amount of additional ATP that was syn- 
thesized by addition of creatine to beige-fat mitochondria. Cre- 
atine drove an increase in oxygen consumption equivalent to a 
7.82-fold (±2.12) excess phosphorylation of ADP. These results 
indicate that in beige adipose mitochondria, creatine acts 
substoichiometrically (with respect to ADP phosphorylation) to 
increase mitochondrial ATP synthesis and stimulate substrate 
flux through the mitochondrial respiratory chain. Addition of 
sub-stoichiometric amounts of creatine to mitochondria iso- 
lated from brown fat, heart, kidney, or liver (Figure 2B) had no 
effect on ADP-dependent respiration (Figures S2G and 2E-2H). 

Striated muscle contains a large quantity of creatine (Fitch and 
Chevli, 1980), and so endogenous mitochondrial creatine levels 
in beige and brown fat were compared to heart following purifi- 
cation of a comparable yield of organelles from these tissues 
(Figure 21). Figure 2J demonstrates that mitochondrial creatine 
levels were similar between tissues following purification, indi- 
cating that the lack of a respiration-enhancing effect in heart 
mitochondria was not due to a high amount of endogenous 
creatine. Together, these data indicate that creatine stimulates 
respiration in beige-fat mitochondria in a substoichiometric 
manner with respect to ADP, specifically when ADP is limiting; 
this is consistent with a model of creatine-driven substrate 
cycling (Figure 2A). 

To examine thermogenesis by creatine-driven substrate 
cycling, we employed differential scanning calorimetry (Ricquier 
et al., 1979). As expected, chemical uncoupling by FCCP 
induced thermogenesis, and this signal was inhibited by rote- 
none and antimycin; the signal was similar to that obtained 
when mitochondria were omitted from the reaction (Figure 
S2H). Importantly, addition of creatine to beige-fat mitochondria 
drove ADP-dependent thermogenesis by ~30% relative to or- 
ganelles incubated with ADP alone (Figure S2I). This is a direct 
demonstration of thermogenesis through creatine metabolism 
in beige-fat mitochondria. 



Creatine Metabolism in Adipose Tissue Contributes to 
Energy Expenditure and Thermal Homeostasis In Vivo 

As the role of creatine metabolism in thermogenic fat tissues is 
essentially unexplored in vivo, we systematically assessed its 
role in beige and brown adipose metabolic functions. First, we 
examined whether creatine regulates oxidative metabolism in 
beige and brown fat. To this end, we utilized 3-guanidinopro- 
pionic acid (p-GPA), a creatine analog that is well established 
to inhibit creatine transport and to reduce creatine levels in 
cultured cells and tissues (Fitch and Chevli, 1 980). A diet supple- 
mented with a typical dose of p-GPA (2%), resulted in reductions 
in food intake and weight loss, as previously reported (Oudman 
et al., 2013); this compromised subsequent metabolic analyses. 
However, when mice were given daily intraperitoneal injections 
with a lowered dose of p-GPA (0.4 g kg“^) for 4 days during 
cold exposure, body weight and fat-pad weight were unaffected 
(Figures S3Aand S3B). Creatine levels were reduced by approx- 
imately 50% in iWAT and BAT and by 15% in gastrocnemius 
muscle (Figure 3A). Mitochondrial respiratory chain protein 
expression was not altered (Figure 3B). To determine the contri- 
bution of creatine metabolism in brown/beige adipose to whole- 
body energy expenditure, we utilized the p3-adrenergic receptor 
agonist CL 316,243 (CL), a well-known activator of adipose 
thermogenesis (Bloom et al., 1992; Granneman et al., 2003). 
Movement, food intake, and oxygen consumption were moni- 
tored in CL-treated mice that were co-treated with either vehicle 
or p-GPA. There was no difference in ambulatory movement, 
food intake, or fuel utilization between vehicle- or p-GPA-treated 
animals (Figures 3C, 3D, and S3C). As expected, we detected a 
54% increase in the metabolic rate of mice following CL injec- 
tion. Strikingly, a reduction in creatine through the administration 
of p-GPA diminished the CL-induced increase in whole-body 
oxygen consumption by ~40% compared to vehicle treatment 
(Figure 3E). 

The reduction in whole-body oxygen consumption prompted 
an examination of which tissues were most affected by p-GPA, 
in terms of whole-tissue respiration. Thus, another cohort of 
mice, housed at 23°C, was separated into four groups: vehicle, 
p-GPA, CL, or CL + p-GPA. p-GPA had no effect on the respira- 
tion of any tissue examined in the absence of a browning stim- 
ulus (Figure 3F). CL treatment provided a powerful browning/ 
beiging stimulus, resulting in a 5-fold increase in iWAT oxygen 
consumption, a 33% increase in BAT respiration, a 9-fold eleva- 
tion in PgWAT respiration, and no effect on Gstrc respiration (Fig- 
ure 3F). Reducing creatine levels with p-GPA had a profound and 
significant effect on beige iWAT, resulting in a 34% reduction in 
oxygen consumption (Figure 3F). BAT oxygen consumption was 
reduced by 18% following p-GPA treatment, PgWAT oxygen 
consumption was decreased by 15%, and no detectable effect 
on Gstrc tissue respiration was detected (Figure 3F). Although 
the p-GPA-dependent reduction in BAT respiration did not reach 
statistical significance, the absolute decrease was larger than 
any other tissue examined, and so in addition to iWAT, BAT 
may well contribute to the effects of p-GPA observed in vivo. 

We next examined the contribution of beige adipocytes within 
iWAT to creatine-dependent oxidative metabolism. Mice with an 
adipose-selective deletion oiPrdiri16 (Adipo-PRDM16 KO) have 
disrupted beige adipose function (Cohen et al., 2014). Following 
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7 days of cold exposure, creatine reduction with p-GPA sig- 
nificantly decreased iWAT oxidative metabolism from control 
Prd 171 mice but had no further effect on the already 
reduced respiratory rate of iWAT from Adipo-PRDM16 KO 
mice (Figure 3G). No significant effect of creatine reduction 
was detected in any other tissue in either genotype, although 
again, there was a modest reduction in the respiration of BAT 
from mice (Figure 3G). Therefore, using two models 

of browning/beiging, these data demonstrate that creatine 
reduction attenuates the oxidative metabolism of thermogenic 
adipose tissues in vivo. 

Respiration of Cultured Human Brown Adipocytes Is 
Regulated by Creatine 

Much, though not all, of the brown fat observed in adult humans 
has characteristics of murine beige adipocytes (Udell et al., 
201 3; Shinoda et al., 201 5; Wu et al., 201 2). Thus, it was of inter- 
est to examine whether creatine contributed to oxidative meta- 
bolism in the recently cloned human brown adipocytes (Shinoda 
et al., 201 5). Both Ckmt1 and Ckmt2 mRNA levels were higher in 
two human clonal brown adipocyte lines compared to human 
white adipocytes (Figure 3H). When human brown adipocytes 
were treated with p-GPA, basal respiration was significantly 
reduced by 45%; this reduction was completely rescued by cre- 
atine supplementation (Figure 3I), demonstrating the specificity 
of p-GPA. Next, RNAi-mediated knockdown was used to assess 
the function of Ckmt1 in these adipocytes. Transfection of small 
interfering RNAs (siRNAs) targeting Ckmt1 resulted in an ~40% 
reduction in Ckmt1 mRNA levels, relative to control siRNA- 
transfected cells (Figure S3D). Knockdown of Ckmt1 resulted 
in a 26% reduction in basal respiration, and creatine supplemen- 
tation did not rescue this effect (Figure 3J). These data demon- 
strate that creatine and CKMT1 regulate oxidative properties of 
isolated human brown adipocytes. 

Compensatory Regulation of Creatine Metabolism 
Components and UCP1 in Adipose Tissues In Vivo 

The ability of creatine to stimulate thermogenesis in an appar- 
ently sub-stoichiometric manner with respect to ADP-dependent 
respiration suggested that this creatine-based futile cycle 
could be an important mechanism of thermoregulation in vivo, 
independent of UCP1. To begin to test this supposition, we 
re-examined the longstanding and important observation that 



mice lacking Ucp1 can maintain thermal homeostasis at 4°C if 
they are gradually acclimated to this temperature (Golozoubova 
et al., 2001). These data demonstrated that alternative thermo- 
genic pathway(s) compensate for the lack of UCP1 (Ukropec 
et al., 2006). We therefore gradually acclimated L/cp 7 -deficient 
(Ucp1~^~) mice to the cold over a period of 21 days. In addition, 
we also reduced creatine levels by administration of p-GPA to 
both L/cp 7^^"^ and Ucp1~'~ animals. As shown previously (Ukro- 
pec et al., 2006), cold-exposed Ucp1~'~ mice had elevated 
expression of classical thermogenic genes, such as D/o2 and 
Pgc-1a, in iWAT compared to L/cp 7 mice (Figure 4A). Pgc- 
1a was also induced in BAT from Ucp1~^~ mice (Figure S4A). 
These data suggest a strong induction of a thermogenic pheno- 
type in the iWAT of Ucp1~'~ mice. Slc6a8 and Ckmtl transcripts 
were elevated in iWAT of cold-exposed Ucp1~'~ mice, relative to 
L/cp 7^^"^ littermates (Figure 4B), whereas Slc6a8 and Ckmt2 tran- 
script levels were higher in BAT from the same animals 
(Figure S4B). 

As the induction of creatine metabolism genes was elevated in 
Ucp1~'~ mice, classical thermogenic genes, such as L/cp 7 and 
D/o2, were significantly elevated in iWAT of cold-exposed mice 
after creatine reduction with p-GPA (Figure 4C). A modest 
increase in L/cp 7 and Pgc-la mRNA was observed in BAT of 
cold-exposed Ucpl'^^'^ mice following creatine reduction with 
p-GPA (Figure S4C). UCP1 protein levels were also elevated in 
response to p-GPA in iWAT of L/cp 7^^"^ mice but did not change 
in the BAT of the same animals (Figure 4D). 

We also examined the compensatory regulation between cre- 
atine metabolism genes and Ucpi in human brown adipocytes. 
In these cell lines, Ckmtl was more abundant than Ckmt2 at the 
mRNA level (Figure 3H). Transfection of siRNAs targeting human 
Ckmtl resulted in an ~40% reduction in Ckmtl mRNA levels, 
relative to control siRNA-transfected cells (Figure 4E). Strikingly, 
Ucpi mRNA was elevated 2.5-fold in cells transfected with 
C/c/77 77 -targeted siRNAs relative to control siRNA-transfected 
cells. Forskolin treatment of human brown-fat cells resulted 
in a near 8-fold induction of Ucpi mRNA (Figure 4E), and 
combining forskolin treatment with Ckmtl knockdown resulted 
in a 15-fold elevation in Ucpi transcript abundance above 
non-treated si-Ctrl cells (Figure 4E). Furthermore, si-Ckmtl- 
transfected cells had an elevated abundance of Dio2 and 
Pgc-la, relative to si-Ctrl transfected cells, following forskolin 
treatment (Figure 4E). Taken together, these data demonstrate 



Figure 3. Creatine Metabolism in Adipose Tissue Contributes to Energy Expenditure and Thermal Homeostasis In Vivo 

(A) Creatine concentration in iWAT (beige), BAT, and gastrocnemius muscie (Gstrc) from coid-exposed C57BL/6 mice treated with four daiiy injections of vehicie 
or p-GPA (0.4 g kg“''); n = 5 mice per group. 

(B) Western blot of iWAT (beige) and BAT from animals treated as in (A); n = 3 mice per group. 

(C) Movement, n = 8 mice per group. CL (0.2 mg kg“^) was co-injected intraperitonealy with vehicle (saline) or p-GPA (0.4 g kg“^). 

(D) Food intake, n = 8 mice per group, treated as in (C). 

(E) Oxygen consumption, n = 8 mice per group, treated as in (C). 

(F) OCR of minced tissues following four daily injections of vehicle, p-GPA (0.4 g kg“''), CL (0.2 mg kg“^), or p-GPA + CL; n = 5 mice per group. 

(G) OCR of minced tissues from Prdrr\ld°^''°^ and Adipo-PRDM16 KO mice after 7 days cold exposure. p-GPA was administered on the last 4 days of cold 

exposure; n = 5 to 7 mice per group. 

(H) qRT-PCR of Ckmtl and Ckmt2 in two human brown adipocyte lines (#1 1 -1 and 1 1 -3) and one human white adipocyte line (#11). Raw ct values are embedded 
in the bars. 

(I) OCR of human brown adipocytes treated with vehicle, p-GPA (2 mM), and creatine (5 mM). 

(J) OCR of human brown adipocytes transfected with si-Ctrl or si-Ckmtl . Creatine used at 5 mM. 

Data are presented as means ± SEM. *p < 0.05, **p < 0.01 , ***p < 0.0001 . 



Cell 163, 643-655, October 22, 2015 ©2015 Elsevier Inc. 649 




Cell 



A iWAT(beige): 
Thermogenesis 




iWAT (beige): 
Creatine metaboiism 



C iWAT(beige): 
Thermogenesis 





iWAT(beige) BAT 

Vehicie p-GPA Vehicie p-GPA 







J 

1.0 

>. 0.8 

0 

1 0.6 

CD 

it 0.4 
00 

0 






■ Vehicie - 30°C 

■ p-GPA - 30°C 

■ Vehicie - 4°C 
Op-GPA-4°C 



K 



iWAT(beige) 



Tissue respiration - Ucpl'^' 

BAT PgWAT 



Gstrc 







Figure 4. Creatine Metabolism Regulates UCP1 -Independent Thermal Homeostasis In Vivo 

(A) qRT-PCR from iWAT of 4°C-acciimated UcpV'^ and Ucp1~'~ mice; n = 5 mice per group. 

(B) qRT-PCR of creatine metaboiism genes from same sampies as in (A). 

(C) qRT-PCR from iWAT of 4°C-acciimated mice treated with four daiiy injections of vehicie or p-GPA (0.4 g kg“^); n = 5 mice per group. 

(D) Western biot from iWAT and BAT from mice treated as in (C). Vincuiin (VCL), ioading controi, n = 3 mice per group. 

(E) qRT-PCR of cionai human brown adipocytes (iine #11-1). siRNAs targeted against controi (si-Ctri) or Ckmtl (si-Ckmtl). Forskoiin was used at 10 laM for 4 hr. 
(The data showing si-Ctri and si-Ckmtl , without forskoiin treatment, are the same data shown in Figure S3D); n = 3 per group. 

(F) Body temperature of L/cp7^^^ and Ucp1~'~ mice treated with vehicie or p-GPA (0.4 g kg“^); n = 7 to 8 mice per group. 

(G) Representative eiectromyogram (EMG) traces, measured at 4°C, of 4°C-acciimated Ucp1~'~ mice treated as in (F). 

(H) Frequency of shivering bursts quantified from data in (G). 

(i) Representative EMG traces, at 30°C and foiiowing 15-45 min at 4°C, of 30°C-acciimated wiid-type C57BL/6 mice treated as in (F). 

(legend continued on next page) 
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that genes involved in creatine metabolism are increased in the 
absence of UCP1, and expression of classical thermogenic 
genes is elevated when creatine metabolism is disrupted. These 
data suggest a compensatory relationship between UCP1- and 
creatine-dependent bioenergetics in mice and humans. 

Creatine Regulates UCP1 -Independent Thermal 
Homeostasis In Vivo 

To test the role of creatine metabolism in adaptive thermogene- 
sis in vivo, we examined the body temperature of cold-accli- 
mated Ucpl'^^'^ and Ucp1~'~ mice that were treated with either 
p-GPA or vehicle for 4 days. Ucp1~'~ mice weighed slightly 
less than Ucpl'^^'^ mice (Figure S4D). Body temperature of 
Ucp1~'~ mice was higher than that of Ucpl'^^'^ animals at 
23°C and 4°C (Figure 4F) (Ucpr^^\ 34.4 ± 0.1 rC and Ucp1~'-\ 
34.7 ± 0.1 4°C, at 4°C). This temperature difference between ge- 
notypes is consistent with previous work (Liu et al., 2003). Strik- 
ingly, body temperature of the Ucp1~'~ animals was significantly 
lower when endogenous creatine levels were reduced with 
p-GPA (Figure 4F), suggesting that in the absence of UCP1, 
creatine regulates a larger proportion of cold-induced thermo- 
genesis than when UCP1 is present. 

Creatine contributes to skeletal muscle metabolism, and so it 
was critical to determine whether the effect of p-GPA on body 
temperature was due to reductions in shivering thermogenesis. 
Thus, we used electromyography (EMG) to monitor shivering 
directly. Mice housed at 30°C showed minimal EMG activity. 
However, within 15 to 30 min of exposure to 4°C, bursts of 
shivering were detected (Figure S4E). Next, we used EMG 
on Ucp1~'~ mice that had been acclimated to 4°C, and similar 
to previous work (Golozoubova et al., 2001), we detected robust 
shivering activity (Figure S4E). These measurements were 
confirmed as shivering bursts because they were abolished 
with the nicotinic acetylcholine receptor inhibitor D-tubocurarine 
(Figure S4E). The capacity for shivering of cold-acclimated 
Ucp1~'~ mice was not altered by p-GPA treatment (Figures 4G 
and 4H). We next examined shivering thermogenesis and body 
temperature in wild-type mice that had been acclimated to 
30°C and acutely cold challenged at 4°C following administration 
of vehicle or p-GPA for 4 days. Under these conditions, reduced 
creatine levels did not alter shivering capacity (Figures 4I and 4J) 
or body temperature (Figure S4F). Analysis of serum CK activity 
was used as an independent method to monitor shivering. 
Although serum CK activity was elevated in Ucp1~'~ mice at 
18°C, CK activity was elevated to a similar extent in Ucpl'^^'^ 
and Ucp1~'~ mice when ambient temperature was reduced 
to 1 0°C and 4°C (Figure S4G). Importantly, p-GPA administration 
had no effect on serum CK activity in either genotype. Together, 
these results indicate that the reduced body temperature of 
p-GPA-treated Ucp1~'~ mice was not due to aberrant shivering, 
and that the metabolic and thermogenic adaptations that 
exist in the absence of UCP1 cannot be explained solely by 



shivering thermogenesis. Examination of respiration by isolated 
tissues indicated that despite a lower respiratory capacity of 
iWAT from Ucp1~'~ mice, relative to L/cp7‘^^^ mice (compare Fig- 
ure 4K to Figure S4H), p-GPA treatment reduced oxygen con- 
sumption substantially in this depot from Ucp1~'~ mice by 
40% (Figure 4K). No significant effect on respiration due to 
creatine reduction was observed in any other tissue examined; 
however, a modest reduction was observed in BAT (Figures 4K 
and S4H). 

A Screen of Mitochondrial Phosphatases to Identify 
Candidate Regulators of a Creatine-Driven 
Substrate Cycle 

Taken together, our data indicated that creatine drives high- 
energy phosphate-dependent substrate cycling, which requires 
ADP-stimulated respiration. This led us to consider candidate 
molecules in adipose mitochondria that could carry out this 
process. To this end, we mined our mitochondrial proteomics 
inventory for putative phosphatases/hydrolases and identified 
1 1 candidate phosphatases. Because several known members 
of creatine metabolism had elevated expression in Ucp1~'~ 
mice (Figure 4B), we posited that the expression of a key 
enzyme(s) with phosphatase activity might also be elevated in 
the absence of UCP1 . Transcript levels for all 1 1 of these phos- 
phatases were measured in iWAT and BAT of cold-exposed 
Ucp1^^^ and Ucp1~'~ mice. One gene— P/?osp/?o 7 (phosphatase 
orphan 1)— was dramatically altered at the mRNA level, with 
an 8-fold increase in iWAT of Ucp1~'~ animals (Figure 5A). In 
contrast, none of the candidates were increased in BAT at 
the mRNA level (Figure S5A). The consistent identification of 
PHOSPH01 in our mitochondrial proteomic experiments sug- 
gested that PHOSHP01 is at least partly associated with adi- 
pose mitochondria. Strikingly, protein expression of PHOSPH01 
was dramatically higher in the iWAT and BAT of Ucp1~'~ mice, 
relative to Ucp1^^^ mice (Figure 5B). PHOSPH02, the closest 
homolog of PHOSPH01, displayed the opposite expression 
pattern (Figure 5B). Proteomic quantification of PHOSPH01 
from iWAT and BAT of cold-acclimated Ucp1^^^ and Ucp1~'~ 
mice also demonstrated results similar to those of western blot- 
ting (Figure S5B). Interestingly, creatine reduction with p-GPA 
resulted in a small increase in Phosphol mRNA levels in iWAT 
from cold-exposed Ucp1^'^ mice (Figure 5C), and a modest 
trend in the same direction was detected in BAT (Figure S5C). 
We also examined the compensatory regulation between Phos- 
phol and Ucpi in human brown adipocytes. Transfection of 
siRNAs targeting human Phosphol resulted in an ~50% reduc- 
tion in Phosphol mRNA levels, relative to control siRNA-trans- 
fected cells (Figure 5D). Ucp1 mRNA was significantly elevated 
1.5-fold in cells transfected with P/iosp/io 7 -targeted siRNAs 
relative to control siRNA-transfected cells, and no change was 
observed for the differentiation marker Pparg (Figure 5D). There- 
fore, based on the reciprocal expression pattern between 



(J) Frequency of shivering bursts quantified from data in (i). 

(K) OCR of minced tissues from 4°C-acciimated Ucp1~'~ mice treated as in (F). n = 16 to 17 mice per group (iWAT and BAT); n = 5 to 6 mice per group (PgWAT 
and Gstrc). 

Data are presented as means ± SEM. *p < 0.05, ***p < 0.0001 . 



Cell 163, 643-655, October 22, 2015 ©2015 Elsevier Inc. 651 




Cell 



A iWAT(beige): Candidate phosphatases 




B 



iWAT (beige) 
Ucpl Ucpl' 



Veh GPA Veh 


GPA 


■| ~~ 


li 




_ - [ 


1 

1 

1 

1 

1 

1 

1 





BAT 

Ucp1 Ucp1 

Veh GPA Veh GPA 



PH0SPH02*-* 

VCL 



c 



iWAT(beige): Candidate phosphatases 





E Primary inguinal 
adipocytes 



< J.1.5' 

s 

^ T3 

I § 10' 
15 15 
0) E 



*** 

T I 







shLacZ 

■ shPhospho1-#1 

■ shPhospho1-#2 



Figure 5. A Screen of Mitochondrial 
Phosphatases with (/cpf -Deficient Mice 
Identifies PH0SPH01 as a Regulator of 
Adipocyte Respiration 

(A) qRT-PCR of candidate phosphatases in iWAT 
of 4°C-acciimated Ucp1^'^ and Ucp1~'~ mice; n = 
5 mice per group. 

(B) Western biot of PH0SPH01 and PH0SPH02 
from 4°C-acciimated UcpV'^ and Ucp1~'~ mice, 
treated with vehicie or (3-GPA (0.4 g kg“^). Vincuiin 
(VCL), ioading controi. 

(C) qRT-PCR of candidate phosphatases in iWAT 
of 4°C-acciimated wi id-type mice treated as in (B); 
n = 5 mice per group. 

(D) qRT-PCR of cionai human brown adipocytes 
(iine #1 1 -1) treated with si-Ctri or si-Phospho1 ; n = 
3 per group. 

(E) qRT-PCR of primary mouse inguinai adipocytes 
after P/70sp/7o7 knockdown (shPhospho1-#1 and 
shPhospho1-#2); n = 4 to 5 per group. 

(F) OCR of primary mouse inguinai adipocytes 
treated as in (E); n = 5 to 7 per group. 

(G) Coomassie stain of SDS-PAGE demonstrating 
PH0SPH01 protein expression and purifica- 
tion. M, moiecuiar weight marker; Ni, non-induced; 
i, induced; FT, flow-through; W, wash; E1-E3, 
elutions 1-3. Arrows indicate recombinant 
PH0SPH01. 

(H) Specific activity of PH0SPH01 toward PCho 
(0.5 mM) and PCr (0.5 to 10 mM) exposed to 
various buffer pH and divalent metals. 

Data are presented as means ± SEM. *p < 0.05, 

***p< 0.0001. 
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Phosphol and Ucp1 in mouse tissue and human cultured cells, 
PHOSPH01 was considered as a candidate for involvement in a 
creatine-driven substrate cycle. 

PHOSPH01 Regulates Oxidative Metabolism in Primary 
Inguinal Adipocytes 

Phosphol displayed an expression pattern that would be pre- 
dicted for a factor involved in an alternative, UCP1 -independent 
pathway of energy expenditure. At the mRNA level, its steady- 
state expression was modestly higher in murine primary 



inguinal adipoyctes, relative to primary 
brown-fat cells, whereas Ucp1 demon- 
strated the opposite expression pattern 
(Figure S5D). Therefore, the role of 
PHOSPH01 in adipocyte bioenergetics 
was examined. Using adenoviral-medi- 
ated shRNA delivery, we achieved 
~90% knockdown of Phosphol at the 
mRNA level in primary inguinal cells, 
with no mRNA changes detected for its 
homolog, Phospho2, or the differ- 
entiation markers Fabp4 and Pparg2 
(Figure 5E). Reduced PHOSPH01 levels 
were also observed at the protein level 
(Figure S5E). Both shRNAs used to 
knock down PHOSPH01 significantly 
reduced basal oxygen consumption but had no effect on pro- 
ton leak; the second shRNA also reduced maximal respiration 
(Figure 5F). Knockdown of Phosphol in primary brown adipo- 
cytes (Figure S5F) decreased basal respiration but had no 
effect on proton leak or maximal respiration (Figure S5G). 
Thus, reduced Phosphol levels attenuated oxygen consump- 
tion in primary inguinal and brown adipocytes, although the rela- 
tive effect was greater in inguinal cells. The more pronounced 
effect on basal respiration (largely regulated by ATP demand) 
than on proton leak is consistent with a role for PFIOSPFI01 in 
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Figure 6. Models of Creatine-Driven Futile 
Substrate Cycling 

(A) Model of creatine-driven futile substrate 
cycling based on direct hydrolysis of PCr. 

(B) Model of creatine-driven futile substrate 
cycling based on multiple phosphotransfer events 
catalyzed by multiple enzymes (Enz-|-Enzn). 



ATP-coupled oxygen consumption. One interpretation of this 
data was that PH0SPH01 regulates substrate cycling by liber- 
ating the high-energy phosphate from PCr. Therefore, we purified 
recombinant PH0SPH01 (Figure 5G) in order to examine 
whether PCr can be directly hydrolyzed by PH0SPH01 in vitro. 
PH0SPH01 -specific activity toward phosphocholine was similar 
to that of a prior report (Roberts et al., 2004). However, there was 
limited phosphatase activity toward PCr; this was done under 
various buffer conditions (Figure 5H). Taken together, these 
data suggest that if PHCSPHC1 is involved in a creatine-driven 
substrate cycle, it likely regulates high-energy phosphate meta- 
bolism downstream of the phosphotransfer event that utilizes 
PCr (Figure 6). 

DISCUSSION 

Although UCP1 is critical for optimal thermogenesis, Ucp 1 an- 
imals can survive cold temperatures if gradually acclimated (Go- 
lozoubova et al., 2001 ; Meyer et al., 201 0; Ukropec et al., 2006). 
Furthermore, it has been shown with adrenergic stimulation and 
peptide factors that the thermogenic responses of Ucp1^'^ and 
Ucp1~'~ mice are similar (Granneman et al., 2003; Grimpo 
et al., 2014; Veniant et al., 2015). These findings imply the pres- 
ence of UCP1 -independent thermogenic mechanisms. Glyc- 
erol-3-phosphate shuttle activation and lipid turnover (Flachs 
et al., 201 1 ; Grimpo et al., 2014) have been posited to act inde- 
pendently of UCP1 . Calcium cycling has been proposed to be 
an additional source of thermogenesis in BAT (Ukropec et al., 
2006) and is a well-established thermogenic mechanism in the 
extraocular heater muscle cells of certain fish and in mammalian 
skeletal muscle (Bal et al., 201 2; Block et al., 1 994). Interestingly, 
large reductions in creatine levels have previously been linked to 
deregulated thermal homeostasis in rats (Wakatsuki et al., 1996; 
Yamashita et al., 1995), through unknown mechanisms. 

The stoichiometric relationships observed between creatine, 
ADP, and oxygen consumption suggest a creatine-driven sub- 
strate cycle. By liberating a molar excess of ADP, with respect 
to the amount of added creatine, beige-fat mitochondria enhance 
respiration by maintaining a state of ATP synthesis, as shown 
by the increased rate of oxygen consumption when ADP was 
limiting. Reducing creatine levels by even 50% in beige and 
brown fat was associated with a blunted response to p3-agonism 
at the level of whole-body energy expenditure. A role for creatine 
and creatine kinase in brown adipose tissue metabolism has been 
posited previously (Berlet et al., 1976; Terblanche et al., 1998; 
Watanabe et al., 2008). Our data suggest that creatine meta- 
bolism regulates energy expenditure in both beige and brown fat. 



Gene-expression analyses demonstrated a clear compensa- 
tory regulation between classical thermogenic genes and the 
genes involved in creatine metabolism in murine tissues and 
human adipocytes. Consistent with this reciprocal relationship, 
a reduction in creatine levels attenuated oxygen consumption 
of iWAT from Ucp1~'~ mice and perturbed thermal homeostasis 
of these animals without diminishing shivering. The expression of 
Phosphol was elevated at the mRNA and protein level in Ucp1- 
deficient animals and thus became the focus of examination. 
Although the effect of PHOSPH01 knockdown on adipocyte 
bioenergetics is consistent with a role for it in high-energy phos- 
phate metabolism, PHOSPH01 does not hydrolyze PCr in vitro, 
at least under any conditions we tested. Interestingly, phosphoe- 
thanolamine (PEtn), a direct PHCSPHC1 substrate (Roberts 
et al., 2004), has been demonstrated to inhibit ADP-stimulated 
mitochondrial respiration (Gohil etal., 2013), and so PHCSPHC1 
could affect oxygen consumption by regulating PEtn levels. 

The data presented here indicate that creatine metabolism 
plays an important role in adipose energy expenditure in vivo. 
Most likely, creatine facilitates the regeneration of ADP through 
futile hydrolysis of PCr. This could occur in a single enzymatic 
step, whereby PCr would be the direct substrate of a so-called 
PCr phosphatase (Figure 6A), or via multiple phosphotransfer re- 
actions prior to phosphate hydrolysis from a currently unknown 
phosphometabolite (Figure 6B). 

Similar to what was shown in the recently cloned human brown 
adipocytes (Shinoda et al., 2015), CKMT1 and CKMT2 expres- 
sion is enriched in human BAT (Svensson et al., 2011). If creatine 
metabolism plays a substantial role in thermogenesis in humans, 
as suggested by the work here with isolated cells, it could 
open up possibilities to manipulate energy expenditure in pa- 
tients with metabolic diseases by new drugs or even with dietary 
supplementation. 

EXPERIMENTAL PROCEDURES 

Sucrose Gradient-Purified Mitochondria 

Sucrose was dissolved to a concentration of 1 M and 1 .5M in gradient buffer 
(10 mM HEPES, pH 7.8, 5 mM EDTA, and 2 mM DTT) and layered in a poly- 
allomer centrifuge tube. Mitochondrial samples were loaded on top of the 
gradient and ultracentrifuged at 32,000 rpm for 1 hr at 4°C. Intact organelles 
that banded at the interface of the sucrose cushion were carefully extracted, 
washed twice in SHE buffer, and stored at -80°C until LC-MS/MS or western 
blot analyses. 

Mitochondrial Respiration 

Mitochondrial respiration was determined using an XF24 Extracellular Flux 
Analyzer (Seahorse Bioscience) using 15 |ag mitochondrial protein in a buffer 
containing 50 mM KCI, 4 mM KH2P04, 5 mM HEPES, and 1 mM EGTA, 4% 
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BSA, 10 mM pyruvate, 5 mM malate, 1 mM GDP. Mitochondria were 
piated and centrifuged 2,000 g for 20 min to promote adherence to the XF24 
V7 ceii-cuiture micropiate. Uncoupied and maximai OCR was determined 
using oiigomycin (14 ^iM) and FCCP (10 ^iM). Rotenone and antimycin A 
(4 laM each) were used to inhibit compiex 1- and compiex 3-dependent 
respiration. 

Primary Inguinal Adipocyte Differentiation 

Primary inguinai preadipocytes were counted and piated in the evening at 

20.000 ceiis per weii of a seahorse piate. The foiiowing morning, inguinai pre- 
adipocytes were induced to differentiate with 1 ^iM rosigiitazone, 0.5 mM iso- 
butyimethyixanthine (iBMX), 1 laM dexamethasone, and 5 |ag mi“^ insuiin. Ceiis 
were re-fed every 48 hr with 1 ^iM rosigiitazone and 5 |ig mi“'' insuiin. Ceiis 
were fuiiy differentiated by day 5 post-induction. 

Primary Brown Adipocyte Differentiation 

Primary brown preadipocytes were counted and piated in the evening at 

15.000 ceiis per weii of a seahorse piate. The foiiowing morning, brown prea- 
dipocytes were induced to differentiate with 1 ^iM rosigiitazone, 0.5 mM iBMX, 
5 i^M dexamethasone, 0.114 ^ig mi“^ insuiin, 1 nM T3, and 125 ^iM indometh- 
acin. Ceiis were re-fed every 48 hr with 1 |iM rosigiitazone and 0.5 |ig mi“'' 
insuiin. Ceiis were fuiiy differentiated by day 5 post-induction. 

SUPPLEMENTAL INFORMATION 

Suppiementai information inciudes Suppiementai Experimentai Procedures, 
five figures, and three tabies and can be found with this articie oniine at 
http://dx.d 0 i. 0 rg/l 0. 1 01 6/j.ceii.201 5.09.035. 
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SUMMARY 

While isolated motor actions can be correlated with 
activities of neuronal networks, an unresolved prob- 
lem is how the brain assembles these activities into 
organized behaviors like action sequences. Using 
brain-wide calcium imaging in Caenorhabditis eie- 
gans, we show that a large proportion of neurons 
across the brain share information by engaging in co- 
ordinated, dynamical network activity. This brain 
state evolves on a cycle, each segment of which re- 
cruits the activities of different neuronal sub-popula- 
tions and can be explicitly mapped, on a single trial 
basis, to the animals’ major motor commands. This 
organization defines the assembly of motor com- 
mands into a string of run-and-turn action sequence 
cycles, including decisions between alternative be- 
haviors. These dynamics serve as a robust scaffold 
for action selection in response to sensory input. 
This study shows that the coordination of neuronal 
activity patterns into global brain dynamics underlies 
the high-level organization of behavior. 

INTRODUCTION 

Behavior is composed of individual motor actions and motifs, 
such as limb movements or gaits, which do not achieve organ- 
ismal goals unless they are orchestrated into longer-lasting 
action sequences and behavioral strategies, like navigation, 
grooming, or courtship (Anderson and Perona, 2014; Gray 
et al., 2005; Seeds et al., 2014). Ethologists often make quantita- 
tive descriptions of this higher-level organization using state 
transition diagrams, consisting of distinct, repeatable high-level 
motor states and switches between them (Anderson and Perona, 
2014). The brain’s representation of behavior must account for 
both detailed metrics of individual actions (e.g., strength and 
extent of movement or speed of gait), as well as for their higher 
level orchestration. Identifying how these aspects of behavior 
correspond to measurable neural activity is a necessary step 
toward understanding how the brain encodes and produces 



behavior. Recent studies in invertebrate motor ganglia and 
mammalian cortex show that selection, execution, and shaping 
of motor programs correspond to neural activity patterns across 
large neuronal populations. These studies show that, despite the 
participation of hundreds of sampled neurons, their activity is 
coordinated, and meaningful signals can thus be reduced to 
far fewer dimensions. Moreover, neuronal populations encode 
information dynamically (Briggman et al., 2005; Bruno et al., 
2015; Churchland et al., 2012; Cunningham and Yu, 2014; Har- 
vey et al., 2012; Jin et al., 2014; Mante et al., 2013). For practical 
reasons, recordings in these studies have been performed over 
short intervals that encompass individual motions or brief behav- 
ioral tasks. Hence, the neuronal mechanisms that govern the 
continuous control of behavior and its time course, encompass- 
ing long-lasting and repeated action sequences, remain enig- 
matic. Furthermore, approaches have been typically limited by 
the need to average across trials or to sub-sample from local 
brain regions or motor ganglia. Recently, the first brain-wide sin- 
gle-cell-resolution functional imaging studies, in zebrafish and fly 
larvae and adult C. elegans, revealed motor-related population 
dynamics correlated across distant brain regions. These data 
suggest that behaviorally relevant neural representations might 
occur at the level of global population dynamics and highlight 
the benefit of brain-wide sampling (Ahrens et al., 2012, 2013; 
Lemon et al., 2015; Panier et al., 2013; Prevedel et al., 2014; 
Schrodel et al., 2013). 

The nematode C. elegans is an attractive model system to 
address these problems, due to its stereotypic nervous system 
of just 302 identifiable neurons grouped into 118 anatomical 
symmetry classes (White et al., 1986). However, prior to the 
availability of whole-brain imaging, past studies had not ex- 
plored distributed or population dynamics in C. elegans. 
Instead, identified interneurons and pre-motor neurons have 
been described as dedicated encoders of specific sensory 
inputs or motor outputs and are commonly placed in a context 
of isolated sensory-to-motor pathways (see the following refer- 
ences for examples: Chalasani et al., 2007; Donnelly et al., 
2013; Gray et al., 2005; Ha et al., 2010; lino and Yoshida, 
2009; Kimata et al., 2012). However, these pathways largely 
overlap and are embedded in a horizontally organized and re- 
currently connected neuronal wiring diagram (Varshney et al., 
2011; White et al., 1986). Moreover, recent functional imaging 
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studies revealed that many of these circuit elements encode 
motor rather than sensory related signals (Gordus et al., 2015; 
Hendricks et al., 2012; Laurent et al., 2015; Li et al., 2014; Luo 
et al., 2014). Taken together, these considerations argue against 
separable feed-forward sensory pathways and instead support 
the hypothesis that sensorimotor processing is performed by 
distributed, shared networks operating on widespread motor 
representations. 

In the present study, we provide evidence for this hypothesis 
by showing that many neurons in the C. elegans brain participate 
in a pervasive dynamic population state, collectively represent- 
ing the major motor commands of the animal. The time evolution 
of the neural state is directional and cyclical, corresponding to 
the sequential order of the animals’ repeated actions. These 
network dynamics interface with sensory representations as 
early as at the first synapse downstream of sensory neurons 
and provide a robust scaffold for sensory inputs to modulate 
behavior. Our work suggests that high-level organization of 
behavior is encoded in the brain by globally distributed, contin- 
uous, and low-dimensional dynamics. 

RESULTS 

Brain-wide Activity Evolves on a Low-Dimensional 
Attractor-like Manifold 

We performed whole-brain single-cell-resolution Ca^"^ imaging 
with a pan-neuronally expressed nuclear Ca^"^ sensor in animals 
immobilized in a microfluidic device (Schrodel et al., 2013). In 
each animal (n = 5), we recorded the brain activity under environ- 
mentally constant conditions for 18 min at a rate of ~2.85 vol- 
umes per second. The imaging volume spanned all head ganglia, 
including most of the worm’s sensory neurons and interneurons, 
as well as all head motor neurons and the most anterior ventral 
cord motor neurons (White et al., 1986) (Figures 1A and 1B). In 
each recording, we detected 107-131 neurons and were able 
to determine the cell class identity of most of the active neurons. 
Figures 1C and S1A show a typical multi-neuron time series 
during which a large proportion of imaged neurons exhibited 
discernable Ca^'^-activity patterns. We performed principal com- 
ponents analysis (PCA) on the time derivatives of the normalized 
Ca^"^ traces (Figures 1C-1E). This method produces neuron 
weight vectors, termed principal components (PCs); here, PCs 
are calculated based on the covariance structure found in the 
normalized data (Jolliffe, 2002). For each PC, a corresponding 
time series (temporal PC) was calculated by taking the weighted 
average of the full multi-neural time series. Temporal PCs repre- 



sent signals shared by neurons that cluster based on their corre- 
lations. We found a low-dimensional, widely shared, dominant 
signal: the first three PCs accounted for 65% of the full dataset 
variance (Figure 1 E). We performed PCA on the time derivatives 
of Ca^"^ traces because the resulting PCs produced more 
spatially organized state space trajectories, described below. 

The time integral of temporal PC1 displayed a strong oscilla- 
tory time course with variable period, sharp transitions, and pro- 
longed plateaus and troughs. This pattern derived from the 
antagonistic activity of two groups of interneurons and motor 
neurons (Figure 1C, right) previously implicated in controlling 
the switch between forward- and backward-directed crawling 
(Table S1 summarizes published results). Neurons previously re- 
ported to have opposing roles were observed to have opposing 
signs of their PC1 weights— e.g., AVA promoting backward 
crawling and AVB promoting forward crawling. PC2 and PCS 
received high contributions from head motor neurons. Two of 
these neurons (SMDV and RIV) have been implicated in postural 
changes required for navigational re-orientation maneuvers 
(termed omega turns) (Gray et al., 2005). However, the neuronal 
weights of all three PCs indicated contributions from many neu- 
rons (Figure 1C). PC1-3 weights and their variance contributions 
were consistent across the five datasets (Figures S2A-S2D). 

The phase plot of temporal PC1-3 showed that the neural 
state’s time evolution was cyclical — i.e., the same states were 
repeatedly revisited within a trial, such that successive trajectory 
cycles formed spatially coherent bundles (Figure 1F and Movie 
S1). Consequently, the entire neural state trajectory traced out 
a manifold, which is defined here as the sub-volume in PCA 
space occupied by the neural state trajectory. When mapped 
onto the neural trajectory, individual neurons’ activity rise and 
fall phases occupied class-specific sub-regions on the manifold 
(Figures 1G and S1B). All five recordings displayed a similarly 
structured manifold (Figure S2E). Thus, a large group of interneu- 
rons and motor neurons produces a cyclical, low-dimensional 
population state time-varying signal. 

Interneurons and Head Motor Neurons Reliably Encode 
Motor State and Graded Motion Parameters 

Next, we aimed for a functional interpretation of the neural state 
manifold and its properties. Each manifold sub-region was 
labeled specifically and consistently by different subsets of 
neurons, some of which have been previously implicated in 
the action sequence termed a pirouette (Table S1), which is 
central to navigation (Gray et al., 2005; Pierce-Shimomura 
et al., 1999). During pirouettes, worms switch transiently from 



Figure 1. Brain-wide Activity Is Organized in a Low-Dimensional, Cyclical Neural State Space Trajectory 

(A) Maximum intensity projection of a representative sampie recorded under constant conditions. 

(B) Singie z piane overiaid with segmented neuronai regions. 

(C) Heat piot of fiuorescence (AF/F) time series of 109 segmented head neurons, one neuron per row. Labeied neurons indicate putative ceii iDs. Ambiguous 
neuron iDs are in parentheses (see Figure SI for additionai candidates). Neurons are coiored and grouped by their principai component (PC1-3) weights and 
signs, which are shown by the bar piots on the right. 

(D) integrais of the first three temporai PCs. 

(E) Variance expiained by first ten PCs, biack iine indicates cumuiative variance expiained. 

(F) Phase piot of first two temporai PCs coiored by direction of time evoiution indicated by coior key. 

(G) Phase piots of first two (ieft) and first three (right) temporai PCs. Coiored baiis indicate Ca^"^ rises of three exampie neurons indicated by iegend. 

See aiso Movie SI and Figures SI and S2. 
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forward- to backward-directed crawling, termed a reversal (Fig- 
ures 2A and 2B). They then resume forward crawling with a 
concomitant turn along the dorsal or ventral body axis; worms 
crawl lying on their left or right side (Figures 2C and 2D). We 
performed Ca^'^ imaging experiments of representative neurons 
in freely moving worms while simultaneously recording their 
behavior with an infrared (IR) camera (Faumont et al., 2011). 
We selected neurons based on their PC weights and availability 
of specific promoters to drive GCaMP expression. As with brain- 
wide imaging experiments, animals were recorded 5-10 min af- 
ter removal from food, a paradigm in which pirouettes contribute 
to a local search strategy (Gray et al., 2005). Behavioral analysis 
of the IR movies showed that reversal initiations were each pre- 
ceded by a reduction in crawling speed (slowing bout), though 
20% of slowing bouts did not lead to a reversal (Figures S3A 
and S3B). We thus defined slowing as an additional behavioral 
state and represent pirouettes together with forward crawling 
as action sequences composed of forward run, slowing, re- 
versal, resume forward via dorsal turn, and resume forward via 
ventral turn actions, which is depicted in a state transition dia- 
gram (Figure 2E). 

We first examined Ca^'^ dynamics in neurons with high pos- 
itive or negative PC1 weight. An example trace of RIM neurons 
is shown in Figure 2F. We found that the Ca^"^ signals of RIM 
resided in stable low states during forward-directed crawling 
and that Ca^"^ rises occurred exclusively during reversals (Fig- 
ure 2F). The slope of these signals correlated with the speed of 
reverse crawling (Figure 2G). Although reversals are of variable 
duration (Gray et al., 2005; Pokala et al., 2014) (Figure S3B), 
RIM Ca^"^ rise onsets precisely aligned with reversal start, 
and RIM Ca^"^ fall onsets aligned with reversal end. This rela- 
tionship was highly reliable— approximately 90% of reversals 
were associated with a detectable RIM Ca^"^ rise phase (Fig- 
ure 2H, top), and the remainders were very short reversals 
where small Ca^"^ signals might have been occluded by noise 
(Figure 2F). All clearly discernible RIM Ca^"^ rises above our 
signal-to-noise threshold occurred during reversals. We found 
such a relationship of Ca^"" rise and fall phases with respect to 
reversal events for all tested neurons with positive PC1 weight 
(RIM, AVA, AVE, AIB), while neurons with negative PC1 weight 
(RIB, AVB, RMEV) showed the inverse relationship (Figures 2H 
and S3C-S3H). All these neurons’ activities changed as reli- 
ably as RIM at both forward-reverse and reverse-forward 
transitions. 

Besides this common property of PC1 neurons, class-specific 
relationships between neuronal activity and locomotion were re- 
vealed by freely moving Ca^^ imaging. RIM and AVA Ca^"^ rise 
slopes, and AVE Ca^"^ signal magnitude, were graded and corre- 
lated with reverse crawling speed (Figures 2G, SSI, and S3J). Un- 
like RIM, AVA, and AVE, the activity of AIB did not show strong 
correlations with reverse crawling speed (Figures S3E and 
S3K); however, small AIB Ca^"^ transients co-occurred with for- 
ward slowing bouts, even when no reversal followed (Figures 
S3E and S3Q). Consistent with this, AIB Ca^"^ rise phases pre- 
ceded the forward-to-reversal transition by ~1 s on average (Fig- 
ure 2H). The continuous activity of AVB and RIB, unlike RMEV, 
showed strong correlations with forward crawling speed (Figures 
S3L-S3P; see also Li et al., 2014). Consistent with this, AVB and 



RIB Ca^'^ fall phases preceded the forward to reverse transition 
by ~1 s on average (Figure 2FI). 

Next, we examined the activity of SMDV head motor neurons 
as representative neurons with strong PC2/3 weight. Resump- 
tion of forward crawling begins with a dorsal or ventral bend, 
which was biased (71%/29%) in the ventral direction. The 
head flexure during post-reversal turns is graded and increased 
compared to normal forward crawling, especially for ventral 
bends (Figure 3A). SMDV exhibited Ca^'^ rises at the transition 
from reverse to forward crawling; importantly, these rises 
occurred exclusively during ventrally and not dorsally directed 
events (Figures 3B-3D). The magnitude of these signals corre- 
lated with ventral head-bending flexure (Figure 3E). 

The major qualitative divergence in neural activity patterns be- 
tween the freely moving single neuron and restrained whole- 
brain setups that we observed was the absence, in freely moving 
worms, of prolonged high phases in neurons with positive PC1 
weight. Using RIM as an exemplar, we first ruled out that this dif- 
ference was a consequence of nuclear localization of the Ca^"^ 
reporter used in whole-brain imaging (Figures S3R-S3T). We 
then dissociated the two major differences in these experimental 
conditions by performing experiments in either pharmacologi- 
cally or physically immobilized worms. While low doses of 
the paralyzing agent tetramisole caused RIM high phases in 
conjunction with prolonged slowly executed reversals, physical 
immobilization alone also caused RIM high phases (Figures 
S3U-S3X). These data suggest that impeded motor execution 
leads to a prolongation of the reversal, which is correlated with 
sustained Ca^^ levels in reversal-promoting neurons. 

In summary, the investigated neuronal activities showed both 
(1) sharp transitions depending on discrete motor state (i.e., for- 
ward versus backward crawling, ventral versus dorsal turning di- 
rection) and (2) graded information about motion parameters 
(i.e., forward and reverse crawling speed and head bending 
flexure). Acute motor state reliably matched the activities of the 
associated neurons on a single event basis. Importantly, when 
examining neuron activity periods mapped onto the neural state 
manifold, we observed that neurons encoding the same behav- 
ioral state in freely moving animals shared the same manifold 
sub-regions with rare exception (Figure S1 B). 

Manifold Branches and Bundles Exhibit Distinct 
Neuronal Recruitment Patterns 

Having determined that the neural state manifold is a composite 
of motor related signals, we next aimed for a quantitative 
description thereof. We first segmented the global brain cycle 
into four behaviorally relevant phases using the left AVA neuron 
(AVAL) as a reference: a trough in AVAL Ca^"^ defined the LOW 
state, a Ca^"^ increase the RISE state, a Ca^'^ plateau the HIGH 
state, and a Ca^"^ decrease the FALL state (Figure 4A). We chose 
this single neuron class because it is among the highest PC1 
contributors, participated in every brain cycle, and, unlike tem- 
poral PCs, exhibited sharply discernible transitions; however, 
other strongly PC1 -contributing neurons such as RIM could 
also be used for this purpose. We validated that the appearance 
of lasting plateau and smooth transition states was not due to 
temporal filtering effects of Ca^"^ imaging: all four states were 
readily discernible in AVA membrane voltage recordings, and 
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Figure 2. Distributed Encoding of Motor State and Crawling Speed by Interneurons in Freely Moving Worms 

(A-D) Motor states of pirouette action sequence. White dotted iines show crawiing trajectory. Arrows indicate crawiing direction. 

(E) Behaviorai state transition diagram indicating motor states as circies and possibie transitions as arrows. 

(F-H) Ca^'^ imaging in freeiy moving animais. 

(F) Exampie trace showing RIM activity as normaiized GCaMP/mCherry fluorescence ratio (biack) and corresponding crawiing speed (green). Pink bars overiay 
reverse crawiing periods. Asterisk indicates reversai with no detectabie RiM activity peak. 

(G) Regression anaiysis of crawiing speed versus RiM Ca^^ signai siope. R^ indicates goodness of iinear fit for instantaneous and maximum (in parentheses) 
reverse speed (red) and instantaneous forward speed (gray). Permutation test p vaiue ****p < 0.0001 indicates probabiiity that correiation was obtained by chance. 

(H) Average Ca^^ signais of the indicated neurons triggered to reversai start (ieft) or end (right). Upper and iower traces represent 90^*^ and 10^*^ percentiie of aii 
data, respectiveiy. Number of recorded worms and reversai events are indicated. 

See aiso Figure S3. 



Cell 163, 1-14, October 22, 2015 ©2015 Elsevier Inc. 5 








Please cite this article in press as: Kate et al., Global Brain Dynamics Embed the Motor Command Sequence of Caenorhabditis elegans, Cell 
(201 5), http://dx.d 0 i. 0 rg/l 0.101 6/j.cell.201 5.09.034 



Cell 



R V RD 



R V R V 




Figure 3. SMDV Signals during Ventral, but 
Not Dorsal, Post-Reversal Turns 

(A) Fractional histogram showing postural angle 
of first post-reversal head bend (ventral, yellow; 
dorsal, orange). Numbers indicate percentage 
of all post reversal head bends. Dashed 
vertical black line shows median of all other 
head bends (no difference between ventral and 
dorsal). 

(B-E) SMDV Ca^^ imaging in freely moving 
animals. 

(B) Example trace showing SMDV activity as 
normalized GCaMP/mCherry fluorescence ratio 
(black) and corresponding head-bend angle (pur- 
ple). Pink bars overlay reverse crawling; yellow and 
orange bars overlay ventral and dorsal post- 
reversal head-bends, respectively. 

(C and D) Average SMDV Ca^"^ signals triggered 
to reversals ending with ventral (C) or dorsal (D) 
head bends. Upper and lower traces represent 
90^*^ and 10^*^ percentile of all data, respectively. 
Number of recorded worms and events are 
indicated. 

(E) Regression analysis of normalized peak post- 
reversal head-bend angle versus SMDV Ca^^ 
signal. Ventral and dorsal bends are shown in 
yellow and orange, respectively. Black open 
circles show an equal number of randomly 

selected head-bend peaks during regular forward movement. indicates goodness of linear fits to ventral (V), dorsal (D), and respective control groups. 
Permutation test p values (****p < 0.0001 , **p < 0.01 , not significant) indicate probability that R^ value was obtained by chance. 




Reverse; Dorsal 


E 


• Dorsal switch 


R2 


Rev end 


Control 


I i I 




Ventral switch 


D 


o.r* 


5x 10-®"® 


n = i 


a 0.8 


•Control bends 


V 


0.32™ 


0.002"® 


103/16 I 


c 

CO 








. . 




0 












0.4 



= 0 



-2-1012 

Time from end (s) 



-0.4 







Normalized head-bend angle 
(s.d. units) 



we calculated an estimate of low-pass filtering caused by nu- 
clear Ca^'^-imaging, producing a maximum delay in signal peaks 
of less than 1 .1 s (Figure S4). Although neurons with a common 
relationship to behavior were recruited to the same sub-regions 
of the manifold, their precise phase onsets and offsets varied. In 
order to quantify this observation, for each onset of RISE and 
FALL, we created a vector containing the phase delays of all 
recruited neurons (Figure S5) (see Supplemental Experimental 
Procedures for details). Across the five datasets, we detected 
121 RISE and 123 FALL transitions and observed characteristic 
phase delay distributions for each neuronal class (Figure S5). 
Next, we searched for structure across neuronal classes by per- 
forming k-means clustering separately for the RISE and FALL 
phase timing vectors; we found that both could be significantly 
clustered into two groups each, which we termed RISE1/2 and 
FALL1/2, respectively. RISE1 differed from RISE2 mostly based 
on different timing of neurons; e.g., AIB and RIB activity exhibited 
phase advances during RISE1 (Figure S5). FALL1 and FALL2 
mostly differed by mutually exclusive head motor neuron recruit- 
ments, SMDV/RIV versus RMED/ventral ganglion head motor 
neuron (likely SMB, SMDD, or RMF) (Figure S5). The precise 
ordering detected by this method may be affected by differential 
Ca^"^ dynamics in different cells; however, the reproducible clus- 
tering would be preserved. Using this six-state classification 
(LOW, RISE1/2, HIGH, and FALL1/2), we labeled the neural state 
trajectory and found that each state classifies a distinct bundle of 
trajectory segments (Figures 4A and 4B and Movie S2). Thus, the 
two methods (PCA and phase timing analysis) revealed the same 
dynamical structure in the neural data. Bundle classification 
enabled us to calculate average neural state trajectories illus- 



trating the canonical brain cycle (Figure 4C). Note that, without 
this single-trial clustering analysis, the cycle-averaged trajectory 
would be reduced to a single loop in neural state space. Further- 
more, bundle classification enabled us to estimate a contour sur- 
face of the manifold (Figure 4D and Movie S3), where the extents 
correspond to the standard deviations (SDs) by which the trajec- 
tory path diverges from the canonical (average) path. The trajec- 
tory segments across all cycles are strongly bundled; the mean 
pairwise distance of points across any two phase-registered tra- 
jectory time points within a bundle is ~1 0% of the diameter of the 
full trajectory, and their mean angular divergence is 22° versus 
90° expected from uncorrelated orientations. In summary, we 
find that many active neurons across the brain are tightly bound 
to reproducible and smooth population dynamics. 

The Motor Command Sequence Is Embedded in Neural 
State Space 

Remarkably, the relationships neurons exhibited with behavioral 
transitions (Figures 2H, 3C, and 3D) matched their phase 
relationships with the six state global brain cycle without excep- 
tion. Assembling all of the neuronal-behavioral correlate informa- 
tion gathered via Ca^^ imaging in freely moving worms enabled 
us to unambiguously map the worm’s major motor command 
states onto separate bundles of the neural state manifold (Fig- 
ures 4B-4E)— RISE1 or RISE2, in conjunction with HIGH, corre- 
spond to reversals, with HIGH corresponding to the sustained 
reversal seen only in immobilized animals. FALL1 corresponds 
to the post-reversal ventral turn and FALL2 to the dorsal turn. 
FALL1 and FALL2, in conjunction with LOW, correspond to 
forward crawling. Slowing mapped to final sections of LOW 
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preceding RISEs (Figures 4B-4E, see Experimental Procedures 
for the detailed mapping rules). Thus, the neural state manifold, 
on a single trial basis, embeds the pirouette command sequence 
described in the state transition diagram (Figures 2A-2E). The 
neural trajectory follows the same unidirectional sequence 
through manifold sub-regions as the corresponding behavioral 
sequence executed by freely moving worms during pirouettes. 
This observation motivated us to redraw the state transition dia- 
gram (Figure 2E) as a continuous flow graph (Figure 4E). The 
neuronal manifold, in addition to embedding the command 
sequence, also contains information about graded locomotion 
parameters like the drive underlying crawling speed (Figure 4F, 
see Experimental Procedures for the detailed mapping rules). 
Both motor command states, as well as speed drive, appear 
organized on the manifold; i.e., separable sub-regions unambig- 
uously delimit the distinct command states (Figure 4B) and 
proximal traversals on the manifold exhibit similar speed drives 
(Figure 4F). This manifold organization was clearly apparent in 
all five recordings (Figure S2E). 

Each branching region of the manifold represents a decision 
where the subsequent motor state is determined. To explore 
the process of decision execution, we quantified the time 
course of trajectory separation when branching into RISE1 versus 
RISE2 and FALL1 versus FALL2 and subsequent merging. This 
approach calculates how significantly trajectory segments bundle 
in PCA space when tested against random shuffling of member- 
ship in RISE1 versus RISE2 or FALL1 versus FALL2 clusters (see 
Supplemental Experimental Procedures for details). Consistent 
with the significant clustering of neuronal recruitment vectors 
described above, there was significant separation during the 
RISE and FALL phases (Figures 4G and 4H). Interestingly, this 
also uncovered memory effects: a RISE1 versus RISE2 branch 
choice could, on average, be predicted during the preceding 
FALL period (Figure 4G), and consistent with the previous, 
FALL1 versus FALL2 trajectories remained significantly unmixed 
in the following RISE phases (Figure 4H). Moreover, RISE1 and 
RISE2 are associated, respectively, with long and short preceding 
LOW states (Figure 4I). Both results indicate that the trajectory 
path history influences the future branch choice decision. 

In contrast to the state transition diagram, the neural state 
manifold captures the continuous dynamical structure of motor 
commands and their transitions and contains additional informa- 
tion about graded metrics of motion, like crawling speed and 
postural flexure. Here, we define the terms command state and 
speed drive as the brain’s internal high-level representations of 
the underlying motor programs, since these are readily observ- 
able in immobilized animals in the absence of motor execution. 

Neural State Dynamics Persist When a Hub Output 
Neuron Is Inhibited 

The presence of a representation of the pirouette sequence in 
immobilized animals suggests that the neuronal population dy- 
namics are primarily internally driven and thus represent de- 
scending motor commands that can operate in the absence of 
motor feedback. We sought to further test this hypothesis. 
Despite the largely recurrent connectivity of the C. elegans wiring 
diagram, a bottleneck exists from the head ganglia to body 
motor neurons— AVA pre-motor interneurons are anatomical 



network hubs linking head ganglia neurons to A-class ventral 
cord motor neurons, which mediate the reversal motor program 
(Chalfie et al., 1985; Kawano et al., 201 1 ; Varshney et al., 2011). 
Acutely silencing AVA via transgenic expression of a histamine- 
gated chloride channel (HisCI) (Pokala et al., 2014) abolished 
reversals in freely moving worms (Figure 5A). As expected, simi- 
larly silenced animals under whole-brain imaging (n = 5 record- 
ings) showed substantial attenuation of AVA activity and strong 
uncoupling of AVA from the global brain cycle (Figures 5B and 
S6A). Additionally, activity of the reverse interneurons AVE and 
RIM, which are connected to AVA via gap junctions (White 
et al., 1986) was slightly attenuated (Figure 5B). However, their 
phase relationships with most other neurons appeared normal 
(Figure S6C). A-class ventral cord motor neurons, the principal 
output targets of AVA, also showed significant attenuation 
(Figure 5B). Despite these effects, the cyclical dynamics and 
neuronal recruitment patterns were largely preserved (Figures 
5C, 5D, and S6). The distributions of network state durations 
were unchanged, with the exception of a decrease in HIGH state 
duration, suggesting that network HIGH state prolongation was 
due in part to reinforcement from AVA (Figure 5E). These obser- 
vations raised the possibility that the global brain cycle was also 
intact in freely moving worms with AVA, and therefore reversals, 
inhibited. Unlike in wild-type animals, where 92.5% of turns 
occurred in conjunction with a preceding reversal, in worms 
with silenced AVA neurons, none of the turns were preceded 
by reversals; instead, 68% of turns (32 out of 47) were preceded 
by prolonged slowing or pauses, while the rest occurred during 
apparently normal forward locomotion. Imaging RIM in AVA- 
silenced freely moving animals revealed the presence of sus- 
tained RIM activity during these prolonged slowing or pauses 
preceding normal turning events (Figures 5F-5H). Such tran- 
sients were never seen in controls, where RIM was only active 
during reversals. In AVA-silenced animals, RIM activity often 
entered HIGH states during prolonged pauses, further support- 
ing the above interpretation that the HIGH state occurs due to 
the absence of effectual motor execution (Figures 5F and S3U- 
S3X). These results show that the cyclical time course of the 
brain-wide motor command is maintained in the absence of 
reversal execution, the only effect of which is a prolonged 
HIGH state duration. Analogously, behaviors that are not AVA- 
output mediated (slowing and turns) are also preserved. Further, 
these data imply that AVA is not a privileged generator of motor 
commands but should instead be characterized as an output- 
facing member of the collectively oscillating interneuron group. 

Entrainment of the Global Brain Cycle by Sensory 
Stimulation 

Next, we investigated how these collective network dynamics 
interact with a chemosensory input. Under whole-brain imaging, 
we stimulated oxygen chemosensory neurons with consecutive 
oxygen up- and down-shifts (21 % versus 4%), a protocol previ- 
ously shown to reliably activate BAG, URX, and AQR oxygen 
sensory neurons and to entrain pirouette behavior with high 
pirouette probability at 21% oxygen and low at 4% (Figures 
S7A and S7B; see also references Busch et al., 2012; Schrodel 
et al., 2013; Zimmer et al., 2009). To our surprise, with the 
exception of one ventral ganglion neuron class (RIG or RIF) 



Cell 163, 1-14, October 22, 2015 ©2015 Elsevier Inc. 7 




Please cite this article in press as: Kate et al., Global Brain Dynamics Embed the Motor Command Sequence of Caenorhabditis elegans, Cell 
(201 5), http://dx.d 0 i. 0 rg/l 0.1 01 6/j.cell.201 5.09.034 



Cell 





D 




E 



SUSTAINED 

REVERSAL 





Figure 4. The Neural State Manifold Embeds the Action Sequence and Exhibits Organized Analog Speed Drive 

(A) Phase segmentation of example AVAL trace (left). Four-state brain cycle (middle). Phase timing analysis and clustering leads to six-state brain cycle (right). See 
also Figures S4 and S5. 

(B) Phase plot of the same trial shown in Figure 1 , colored by six-state brain cycle plus FORWARD SLOWING command state in purple (see below). 

(legend continued on next page) 



8 Cell 163, 1-14, October 22, 2015 ©2015 Elsevier Inc. 




Please cite this article in press as: Kate et al., Global Brain Dynamics Embed the Motor Command Sequence of Caenorhabditis elegans, Cell 
(201 5), http://dx.d 0 i. 0 rg/l 0. 1 01 6/j.cell.201 5.09.034 



Cell 



(Figure S7C), we did not detect single-neuron representations of 
sensory stimulus downstream of sensory neurons (n = 1 3 record- 
ings). Moreover, the topology of the neural state manifold did not 
change upon stimulation; however, there were some magnitude 
effects on the amplitude of temporal PC1 (Figure 6A). Based on 
the strong entrainment effect the stimulation protocol has on 
pirouette behavior, we expected that oxygen concentration 
should affect bundle occupancy on the manifold. Indeed, the 
stimulus protocol entrained the global phase of the brain cycle 
so that the probability of the reverse motor command state 
declined during 4% oxygen periods and increased during 21 % 
oxygen periods (Figures 6B and 6C), indicating a successful 
sensorimotor transformation in our preparation. Consistent 
with these findings, Ca^"^ rises in BAG neurons during the 
HIGH state evoked immediate FALL1 or FALL2 transitions in 
56% (30/54, n = 13 recordings) of all instances (see Figure S7C 
for an example). Interestingly, in 22 out of the 24 remaining in- 
stances, secondary BAG Ca^'^-rises coincided with a FALL1 or 
FALL2 transition; these were the only times when we observed 
secondary BAG transients (see Figure S7C as an example). 
This finding suggests the existence of a feedback mechanism 
eliciting or gating secondary Ca^"^ rises in the BAG sensory neu- 
rons, demonstrating that variability in the BAG sensory response 
profile (Zimmer et al., 2009) can be explained when the underly- 
ing brain state is known to the observer. 

Finally, we looked for sensory-evoked Ca^"^ activity in the 
major PC1 neuron classes AVA, AVE, and RIB in freely moving 
animals. Together AVE and RIB receive 47% of BAG neuron syn- 
apses (White et al., 1 986). Consistent with our whole-brain imag- 
ing results, these neurons retained a tight correlation with motor 
state and movement metrics and lacked obvious sensory encod- 
ing activity; the magnitude of Ca^"^ signals was subtly modulated 
during the stimulation periods (Figures S7D-S7U). 

In summary, neural state manifold organization is robust to a 
salient sensory input and thus stably encodes the motor command 
sequences of the worm under these conditions. The major effect of 
sensory input was to modulate the probability that the neural state 
resides on a particular segment bundle by driving the neural state 
along a lawful trajectory. The result is an entrainment of the global 
brain cycle, which is consistent with the entrainment of corre- 
sponding motor behaviors in freely moving worms. 

DISCUSSION 

In this work, we identify and characterize a brain-wide signal 
in C. elegans that dominates the neural activity time series. 



Although our approach required the use of a nuclear localized 
Ca^"^ indicator, omitting the detection of subcellular Ca^"^ signals 
(Chalasani et al., 2007; Hendricks et al., 2012; Li et al., 2014), it 
reveals a pervasive motor state representation that is shared 
among most interneuron and motor neuron layers. The neural 
state trajectory exhibits directional, cyclical flow (Figure 1F) 
confined to a low-dimensional manifold (Figure 4D), organized 
into bundles (Figures 4B-4D) composed of stereotyped and 
smoothly changing neural activity vectors (Figure S5). Each mo- 
tor command within the pirouette action sequence is reliably rep- 
resented across several neurons. Neurons additionally encode 
graded parameters of locomotion, e.g., crawling speed and 
postural flexure (Figures 2, 3, and S3). These data enable us to 
unambiguously map behavioral commands onto sub-regions 
of the neural state manifold, enabling instantaneous behavioral 
decoding throughout an experimental trial (Figures 4B and 4E). 
We interpret these dynamics as corresponding to motor com- 
mands, as they can be decoupled from motor output either by re- 
straint (during whole-brain imaging) or manipulation of a major 
output neuron (Figure 5). Organized flow along the neural state 
manifold mediates the assembly of motor commands into action 
sequences (Figures 4B and 4E); it thus represents the high-level 
temporal organization of behavior upstream of the generation of 
the animal’s undulatory gait. This contrasts with population dy- 
namics in the motor ganglia of crustaceans, mollusks, and lam- 
preys that generate peristaltic and movement rhythms (Bruno 
et al., 2015; Grillner, 2006; Marder and Bucher, 2007). Interest- 
ingly, the brain’s forward and reversal motor commands are 
coupled to corresponding rise, high, fall, and low states in the 
B- and A-class ventral nerve cord (VNC) motor neurons (Figures 
1 and S1), which is consistent with previous studies performed in 
moving C. elegans. Additionally, VNC motor neuron activity ex- 
hibits gait-related rhythmic activity superimposed on these com- 
mand states (Kawano et al., 2011; Wen et al., 2012), which re- 
quires proprioceptive coupling to movement (Wen et al., 2012). 
Taken together, we propose that behavioral state is encoded 
in the brain and coupled to the motor periphery and that this 
coupling co-occurs with locally maintained rhythmic activity. 

These continuous neural dynamics embed behavioral motifs, 
described by the state transition diagram, and permit their su- 
perposition with graded motion metrics (Figure 4F). The process 
of decision making leading to execution of alternate behaviors 
can be observed as the time evolution of neural trajectories 
before the branches (Figures 4B-4D, 4G, and 4H). We propose 
that the phenomenon of global dynamics robustly and continu- 
ously encoding action sequence commands may be present in 



(C) Phase-registered averages of the two RISE phase and two FALL phase bundles colored by six-state brain cycle. Semi-transparent ovals denote trajectory 
bundle mixing regions. 

(D) Contour surface illustrating the neural state manifold colored by six-state brain cycle. 

(E) Flow diagram indicating the motor command states corresponding to the six-state brain cycle plus FORWARD SLOWING command state (purple). 

(F) The same phase plot colored by forward- and reverse-speed drive inferred from neural correlate decoding. Green trajectory segments indicate the 
SUSTAINED REVERSAL state, for which no drive correspondence is made. See Figure S2 for more examples. 

(G and H) Quantification of inter-bundle separation and mixing for RISE (G) and FALL (H) clusters. Traces show trial-averaged p values (shading indicates SEM; 
n = 5 animals) of mean normalized pairwise distance at instantaneous points in the past or future, which indicate the probability that the observed separation 
between bundles occurred by chance. This calculation was done in six dimensions (PC1-3 plus their derivatives) to incorporate directional information from the 
trajectory paths. 

(I) Distribution of LOW state durations preceding RISE1 or RISE2 segments. 

See also Movies S2 and S3. 
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Figure 5. Global Brain Dynamics Persist when Decoupled from Motor Output 

(A) Reversal events per minute for AVA::HisCI worms without (-His) or with (-i-His) histamine treatment. Each data point represents a singie assay, n = 20-25 
worms per assay. Horizontai iines show means. Mann-Whitney test, **p < 0.01 . 

(B) Shifts in triai-averaged root-mean-squared power of neuronai trace derivatives of AVA: :HisCi worms with histamine treatment, reiative to wiid-type controi (n = 
5). Gray bars indicate non-significant power shifts, red bars indicate significant power shifts. Ciass-A motor neurons, typicaiiy 1-2 visibie per recording, were 
combined. Significance was determined using a permutation test, ****p < 0.0001, **p < 0.01, *p < 0.05. 

(C and D) integrated temporai PCs (C) and phase piots (D) of an exampie AVA::HisCi dataset. 

(E) Distributions of state durations of AVA::HisCi (red) versus wiid-type (biue) across muitipie triais (n = 5). 

(F-H) Ca^^ imaging of RIM in freely moving animals expressing HisCI in AVA. 

(F) Example trace showing RIM activity in an AVA::HisCI worm after histamine treatment. Normalized GCaMP/mCherry fluorescence ratio (black) and corre- 
sponding crawling speed (green) are shown. Omega turns are indicated with gray overlaid bars. These worms did not exhibit reversals. 

(G and H) Averages of RIM Ca^"^ signals in AVA::HisCI worms triggered to omega turn onset, for worms pre-incubated without (G) or with (H) histamine. Upper and 
lower traces represent 90^*^ and 10^^ percentile of all data, respectively. Number of recorded worms and omega turns are indicated. 

See also Figure S6. 



higher animals with more sophisticated behavioral repertoires. 
This hypothesis is supported by the observation of smooth pop- 
ulation dynamics maintaining navigational plans in rodents (Har- 
vey et al., 2012). Its generality could be further tested by studying 
the basis of well-described sequential courtship and grooming 
behaviors in fruit flies (Dankert et al., 2009; Seeds et al., 2014). 



The ability to find dynamical structure solely on the basis of 
neural event timing (Figure S5) suggests that the structure 
we observe is not a particular consequence of the graded, 
non-spiking, nature of C. elegans neurons. We speculate that 
neuronal population trajectories associated with action selection 
in leeches (Briggman et al., 2005), limb movement in monkey 
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Figure 6. Entrainment of the Global Brain Cycle by Sensory Stimulation 

Animals were recorded and stimulated with the oxygen profile indicated in (B). 

(A) Phase plots of temporal PCs 1-2 from a representative recording. Top: behavioral command state coloring as in Figure 4B. Bottom: trajectory segments during 
the pre-stimulus period are labeled gray; segments during the 4% and 21% shift periods are labeled blue and red, respectively. 

(B) The trace shows the probability of reversal command state (REVERSAL1 + REVERSAL2 + SUSTAINED REVERSAL) calculated over n = 13 recordings. 

(C) Reversal command state probability as in (B) but averaged over the six down- and up-shift periods, p values are calculated by a resampling test and indicate 
the probability that the stimulus-synced profile shape occurred from a randomly time-shifted stimulus pattern. 

See also Figure S7. 



cortical areas (Georgopoulos and Carpenter, 2015; Shenoy 
et al., 2013), and speech in humans (Bouchard et al., 2013) 
may be sparsely sampled windows onto similarly well-orga- 
nized, smooth global dynamics. 

Our work establishes a framework for future studies aimed at 
embedding more fine-scaled behaviors beyond the discrete 
classifications of the state transition diagram, such as gradual 
steering commands (lino and Yoshida, 2009) and locomotory 
gait (Stephens et al., 2008). By exploring more sophisticated 
sensory input paradigms and studying the animal in different 
contexts and life stages, we expect that the neural state 
manifold will be further sub-dividable and support the map- 
ping of other behavioral parameters. Additionally, in-depth 
analysis of whole-brain activity may uncover previously hidden 



aspects of behavior; for example, we found two types of rever- 
sals (corresponding to RISE1 and RISE2) in whole-brain activity 
that currently lack known behavioral correlates. Although AVA in- 
hibition had only subtle effects, systematically expanding this 
approach to other neurons and combinations thereof should 
reveal whether individual neurons or sub-ensembles are causal 
to brain dynamics. By probing the system with acute perturba- 
tion using optogenetics and imaging at finer timescales and 
sub-neuronal spatial resolution, it should be possible to uncover 
the neuronal logic governing trajectory control and branch selec- 
tion, which underlies decision making in this system. Measuring 
manifold geometry changes over longer timescales may uncover 
the characteristics of brain states such as hunger-satiety or 
sleep-wakefulness. 
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Our results argue against models of largely feed-forward sen- 
sory-to-motor flow where intermediate neuronal layers perform 
sequential processing and the behavioral state is only ultimately 
represented within the nervous system at the motor periphery. 
Instead, our data support a model of an early interface between 
sensory and motor representations as was suggested by recent 
single-neuron studies (Hendricks et al., 2012; Luo et al., 2014). 

Moreover, motor command representations affect respon- 
siveness of sensory neurons and early interneurons to sensory 
inputs via feedback mechanisms (Figure S7) that remain to be 
identified (see also Gordus et al., 2015). Consistent with recent 
distributed models of sensorimotor action selection in mammals, 
including primates (Cisek and Kalaska, 201 0), our work suggests 
that the brain’s outputs— i.e., its intents and actions— make up a 
large fraction of its dynamic activity state. 

Our findings reveal that a large collection of neuronal classes 
with distinct morphologies and connectivities (White et al., 
1 986), distinct molecular compositions and neurotransmitter ex- 
pression patterns (Hobert, 2013), distinct synaptic transmission 
properties (Li et al., 2014), and distinct subcellular signal pro- 
cessing capacities (Chalasani et al., 2007; Hendricks et al., 
201 2; Li et al., 201 4) nevertheless collectively share a low-dimen- 
sional, pervasive neuronal signal. The class-specific phase rela- 
tionships with respect to the global brain cycle (Figures S1 B and 
S5) suggest that neurons differentially interact with this shared 
mode. We therefore propose that the neural state manifold 
influences and binds local activity to a global reference frame- 
work, establishing a consensus that produces stable, coherent 
behavior. 

EXPERIMENTAL PROCEDURES 

The Supplemental Experimental Procedures contain more detailed informa- 
tion on each procedure, and in addition, they include descriptions of region 
of interest detection and neural time series extraction from volumetric Ca^^ im- 
aging data, electrophysiology, simulation of nuclear GCaMP signals from 
voltage traces, population behavior assays, statistics applied in this study, 
strain genotypes, and molecular biology constructs. 

Whole-Brain Imaging of C. elegans Head Gangiia Neurons 

Animals were immobilized with 1 mM tetramisole in microfluidic devices that 
allow controlled O 2 stimuli as previously described (Schrodel et al. , 201 3; Zim- 
mer et al., 2009). Recordings were started within 5 min after removal from food. 
Worms were either imaged for 1 8 min at constant 21 % O 2 or, for the stimulus 
protocol, imaged for 12 min with the first 6 min at 21 % O 2 and the remaining 
6 min with 30 s consecutive shifts between 4% and 21 % O 2 . Data were ac- 
quired using an inverted spinning disc microscope (UltraViewVoX, Perki- 
nElmer) equipped with an EMCCD camera (C9100-13, Hamamatsu). 

Identification of Head Gangiia Neurons 

In each recording, we detected 107-131 neurons, covering 55%-67% of ex- 
pected neurons in the imaging area. Neurons were identified taking into ac- 
count their anatomical positions, also in relation to surrounding neurons 
(http://www.wormatlas.org), and their activity patterns. To confirm ambiguous 
neuron identities, marker lines expressing red fluorophores in neurons of inter- 
est were generated and crossed to the imaging line expressing GCaMPSK 
pan-neuronally in the nucleus (ZIM504). 

Time Series Anaiysis: PCA, Numericai Differentiation, 4-Phase 
Segmentation, Phase Timing Anaiysis, and Ciustering 

PCA was performed on the time derivatives of AF/Fq neural traces, each 
normalized by its peak magnitude. To compute de-noised time derivatives 



without the need of smoothing that can affect precise timing of sharp transi- 
tions, the total-variation regularization method (Chartrand, 2011) was applied. 
To segment individual neuronal activity into 4-phase sequences, first RISE and 
FALL phases for neurons were identified as periods when the time derivative 
was greater or lower than a small threshold, respectively. HIGH and LOW 
phases were then inferred in the remaining gaps. For trajectory segment aver- 
aging (Figures 4C, 4D, and S2E) and generation of Movies S2 and S3, neuronal 
time series were registered to a common phase clock by matching phase 
segment starts and ends to the reference neuron (AVA or RIM) rise onsets 
and fall offsets, respectively, followed by linearly interpolating within phase 
segments. To perform phase timing analysis, first a set of global transitions, 
either RISE or FALL onsets, were defined by the transitions of a reference 
neuron (AVA or RIM in this study). Then, relative time delays of the nearest tran- 
sitions found in other neurons were used to compose a feature vector for each 
global transition. In the absence of a matching transition within 7 s of the refer- 
ence neuron transition, a time delay of -10 s was used for the purposes of 
clustering, since the absence of neurons was also considered an important 
feature of transitions. K-means clustering was applied to transition feature 
vectors for each full trial using L-i distance and k = 2. Detailed explanations 
of the above computational analyses may be found in the Supplemental Exper- 
imental Procedures. 

Behavioral Decoding of Whole-Brain Recordings 

Each time point of the phase plot trajectory was first assigned to a global brain 
cycle HIGH, LOW, RISE1, RISE2, FALL1, or FALL2 segment as described 
above and in the main text, then mapped to motor command states as follows. 
RISE1 and RISE 2 segments were mapped to REVERSAL1 and REVERSAL2 
command states, respectively. HIGH segments were mapped to the 
SUSTAINED REVERSAL state. FALL1 and FALL2 segments were mapped to 
VENTRAL TURN and DORSAL TURN, respectively. LOW segments were map- 
ped to FORWARD except that RIB FALL phases present during global LOW 
segments were mapped to FORWARD SLOWING command states. A speed 
drive was assigned to each point on the trajectory as follows, aside from those 
in SUSTAINED REVERSAL phases for which no speed drive was inferred. Dur- 
ing VENTRAL TURN, DORSAL TURN, FORWARD, and FORWARD SLOWING 
phases, positive speed drive was taken to be the magnitude of RIB activity, 
normalized to its most negative value during the trial. During REVERSAL1 
and 2 phases, negative speed drive was taken to be the derivative of RIM 
neuron activity, normalized to its highest value during the trial. 

Imaging in Freeiy Moving Animais 

Ca^'^ imaging recordings were made using the automatic re-centering system 
described previously (Faumont et al., 201 1) with custom modifications. Young 
adult worms (0-8 eggs) expressed both mCherry and GCaMP in the neuron of 
interest. Animals were recorded while freely crawling on agar in a custom built 
microscope stage containing an airtight chamber with inlet and outlet connec- 
tors for gas flow delivery. Images were acquired using two CCD cameras 
(Evolve 512, Photometries) connected via a DualCam DC2 beam splitter (Pho- 
tometries). A long-distance 63x objective (Zeiss LD Plan-Neofluar 63x, 0.75 
NA) was used to obtain unbinned images streamed at 30.3 frames per second 
(fps) acquisition rate. Simultaneous behavior recordings under infrared illumi- 
nation (780 nm) were made using a CCD camera (Manta Prosilica GigE, 
Applied Vision Technologies) at 4x magnification and 10 fps acquisition rate. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, one table, and three movies and can be found with this article 
online at http://dx.doi.Org/10.1016/j.cell.2015.09.034. 
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SUMMARY 

Ethylene is a gaseous phytohormone that plays vital 
roles in plant growth and development. Previous 
studies uncovered EIN2 as an essential signal trans- 
ducer linking ethylene perception on ER to transcrip- 
tional regulation in the nucleus through a “cleave and 
shuttle” model. In this study, we report another 
mechanism of EIN2-mediated ethylene signaling, 
whereby EIN2 imposes the translational repression 
of EBF1 and EBF2 mRNA. We find that the EBF1/2 
3' UTRs mediate EIN2-directed translational repres- 
sion and identify multiple poly-uridylates (PolyU) 
motifs as functional c/s elements of 3' UTRs. Further- 
more, we demonstrate that ethylene induces EIN2 to 
associate with 3' UTRs and target EBF1/2 mRNA to 
cytoplasmic processing-body (P-body) through in- 
teracting with multiple P-body factors, including 
EIN5 and PABs. Our study illustrates translational 
regulation as a key step in ethylene signaling and 
presents mRNA 3' UTR functioning as a “signal 
transducer” to sense and relay cellular signaling in 
plants. 

INTRODUCTION 

Ethylene is a gaseous phytohormone produced by plants in 
response to various internal and environmental stimuli, which 
triggers a wide range of physiological and morphological re- 
sponses (Johnson and Ecker, 1998). During the past decades, 
a relatively linear ethylene signaling pathway has been estab- 
lished through the application of molecular and genetic ap- 
proaches (Guo and Ecker, 2004). In Arabidopsis, ethylene is 
perceived by a group of ER-located receptors (Chang and 
Stadler, 2001). In the absence of ethylene signal, the hor- 
mone-free receptors activate a Raf-like protein kinase 
CONSTITUTIVE TRIPLE RESPONSE 1 (CTR1) (Gao et al., 
2003; Kieber et al., 1993). Activated CTR1 and the receptors 
cooperatively inhibit an ER-located membrane protein 
ETHYLENE INSENSITIVE 2 (EIN2) through physical interaction 
and protein phosphorylation (Alonso et al., 1999; Bisson and 
Groth, 2011; Ju et al., 2012). 
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EIN2 is a key component in ethylene signaling pathway, evi- 
denced by completely ethylene-insensitive phenotypes of the 
ein2 null mutants (Ji and Guo, 2013). It is encoded by a single- 
copy gene in Arabidopsis, and is conserved from charophyte 
green algae to land plants (Ju et al., 2015). While the function of 
its N-terminal membrane-spanning domain is not clear, the C-ter- 
minal end of EIN2 (CEND) is thought to participate in signaling 
output, as ectopic expression of this domain alone can partially 
activate ethylene responses (Alonso et al., 1999; Wen et al., 
2012). Recent studies reported that CEND can be phosphory- 
lated by the receptors-activated CTR1 in the absence of ethylene 
(Ju et al., 2012; Qiao et al., 2012). Upon ethylene application, 
inactivation of the receptors and CTR1 abolishes the phosphory- 
lation state of CEND, leading to its proteolysis from the ER-teth- 
ered N terminus, followed by shuttling into the nucleus (Ju et al., 
2012; Qiao et al., 2012; Wen et al., 2012). However, this “cleave 
and shuttle” mode might represent part of the EIN2 actions, as 
induced nuclear localization of CEND only partially activates 
ethylene signaling (Ji and Guo, 2013; Wen et al., 2012). Mean- 
while, ethylene also induces CEND to form discrete and promi- 
nent foci in the cytoplasm (Qiao et al., 2012; Wen et al., 2012), 
but the function of such cytoplasmic portion remains unexplored. 

In the nucleus, components working downstream of EIN2 are 
two master transcription factors ETHYLENE INSENSITIVE 3 
(EIN3) and its homolog EIN3-LIKE 1 (EIL1), which regulate the 
vast majority of ethylene-directed gene expression (Chang 
et al., 2013; Chao et al., 1997). One of the key regulatory mech- 
anisms of ethylene signaling is the stabilization of EIN3/EIL1 
proteins, wherein ethylene acts to repress the proteasomal 
degradation of EIN3/EIL1 mediated by two F-box proteins, 
EIN3-BINDING F-BOX 1 (EBF1) and EBF2, in an EIN2-dependent 
manner (An et al., 2010; Guo and Ecker, 2003; Potuschak et al., 
2003). However, the molecular mechanism of how ethylene or 
EIN2 represses the function of EBF1/2 is still elusive. 

ETHYLENE INSENSITIVE 5 (EIN5), encoding a cytoplasmic 
5'-3' exoribonuclease (AtXRN4), is another component positively 
modulating ethylene responses (Olmedo et al., 2006; Potuschak 
et al., 2006). Currently, little is known about how EIN5 modulates 
ethylene signaling, except for the genetic evidence suggesting 
its participation in the regulation of EBF1/2 function (Olmedo 
et al., 2006; Potuschak et al., 2006). Notably, small RNA frag- 
ments corresponding to EBF1 and EBF2 mRNA 3' UTR were pro- 
cessed and accumulated in eln5 (Olmedo et al., 2006; Potuschak 
et al., 2006; Souret et al., 2004). Our recent work uncovered that 
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Figure 1. Overexpression of EBFY 3' UTR Results in Reduced Ethylene Sensitivity 

(A) Schematic diagrams of the gene structure of ESF7 and the 3'-L/7f?-overexpressing construct. Fuii-iength EBF1 3' UTR (643 bp after stop codon) pius a 66-bp 
flanking sequence was inserted into the multiple cloning site (MGS) prior to the NOS terminator in pDr vector. S in open circle, stop codon. 

(B) Quantification of 3' UTR transcripts in etiolated seedlings of three independent transgenic lines grown on MS medium with (+) or without (-) ACC (an ethylene 
biosynthetic precursor). Vector means pDr-expressing transgenic plants while 3' UTR means 3'-L/7f?-overexpressing transgenic lines. Arrows denote the primers 
used for qRT-PCR to detect the levels of 3' UTR. 

(C) The triple response phenotypes of seedlings corresponding to (B). 

(D) Quantification of hypocotyl lengths and root lengths of the seedlings in (C). **p < 0.01 . Mean + SD, n > 10. 

(E) Immunoblot assays showing EIN3 protein levels of seedlings corresponding to (B). A nonspecific band served as a loading control. The numbers indicate the 
relative EIN3 protein levels as calculated from three biological replicates. 

(F and FI) Schematic maps of M1 U {MYC-EBF1 3' UTR) and G1U {GFP-EBF1 3' UTR), as well as two control transcripts MYC and GFP. A(n) represents the poly(A) 
tail. Qf note, all these transcripts are driven by CaMV 35S promoters. 

(G and I) The triple response phenotypes of etiolated seedlings of wide-type Col-0 as well as three independent lines of indicated transgenic plants. 

See also Figure SI . 



EIN5, in combination with 3'-5' RNA decay pathway, is respon- 
sible for the removal of many defective coding transcripts as 
well as the cleavage fragments of mlRNA targets, including 3' 
UTRs, which are otherwise subjected to posttranscriptional 
gene silencing (Zhang et al., 2015). However, genetic evidence 
disfavored the possibility that 3' UTR fragments of EBF1I2 
mRNA are processed and targeted to small RNA-mediated 
gene silencing pathway (Potuschak et al., 2006). Interestingly, 
ectopic expression of a 3' UTR-truncated EBF2 gene resulted 
in a stronger ethylene insensitive phenotype than that of the 
EBF2 full-length gene (Konishi and Yanagisawa, 2008), implying 
a negative role of 3' UTR on the EBF2 function. 

In this study, we sought to investigate the regulatory mecha- 
nisms of how ethylene signal is relayed from cytoplasm to 
nucleus, and how EIN2 and EIN5 participate in this signaling pro- 
cess. Strikingly, we found that ectopic expression of either EBF1 
or EBF2 3' UTR fragments confers strong ethylene-insensitivity 
phenotypes through promoting the translation of endogenous 



EBF1/2 mRNAs. Furthermore, we found that ethylene induces 
EIN2 to target EBF1 3' UTR to cytoplasmic processing-body 
(P-body) through interacting with EIN5 and other P-body factors 
to repress EBF1I2 translation. Our study uncovers another 
branch of ethylene signaling pathway mediated by cytoplasmic 
EIN2 in translational control. 

RESULTS 

Overexpression of EBF1 3' UTR Leads to Reduced 
Ethylene Sensitivity 

Previous studies revealed that the ein5 mutant accumulated 
EBF1/2 mRNA 3' UTR fragments (Olmedo et al., 2006; Potu- 
schak et al., 2006; Souret et al., 2004). We thus speculated 
that the over-accumulated 3' UTR fragments could contribute 
to the ethylene insensitivity of ein5. To test this speculation, we 
overexpressed the EBF1 3' UTR region (1U) in wild-type Col-0 
plants (Figures 1A and IB). The so-called “triple response” 
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phenotype is commonly used as an ethylene-specific growth 
response in Arabidopsis, which refers to exaggerated apical 
hooks, shortened hypocotyls and roots of dark-grown seedlings 
exposed to ethylene or treated with ethylene precursor 1 -amino- 
cyclopropane-1 -carboxylic acid (ACC) (Bleecker et al., 1988; 
Ecker, 1995). Overexpression of 7 L/ conferred significant attenu- 
ation of triple response phenotypes to Col-0, resulting in elon- 
gated hypocotyls and roots compared with control seedlings 
(Figures 1C and ID). Consistently, we found that the levels of 
EIN3 protein were lower \n 1U transgenic plants than that in 
Col-0 (Figure 1 E). 

Furthermore, we fused 1U to the MYC tag and GFP coding 
sequence (referred to as M1U and G1U), respectively (Figures 
1 F and 1 H), and overexpressed these fusion genes in wild-type 
Col-0 (Figures S1A-S1C and S1F-S1H). Similar to 7L/-overex- 
pressing seedlings, M1U- and G7L/-overexpressing plants dis- 
played reduced ethylene sensitivity and impaired EIN3 protein 
accumulation compared with control plants (Figures 1G, II, 
SID, S1E, SI I, and S1J). Together, these results demonstrate 
that overexpression of 1U, alone or in fusion with unrelated tran- 
scripts, reduces ethylene sensitivity. 

Overexpression of EBF1 3' UTR Promotes the 
Translation of Endogenous EBF1/2 mRNAs 

Interestingly, we found that ethylene hyposensitivity resulting 
from 7L/-overexpression was partially restored by a defect in 
either EBF1 or EBF2 (Figure 2A). Due to the fatal effect of 
over-accumulated EIN3 in ebf1 ebf2 double mutant, we next 
overexpressed M1U in (3-estradiol-inducible EIN3-Flag/ein3 
eil1 ebf1 ebf2 (iEIN3/qm) (An et al., 2010), which was used 
as a substitution of the lethal ebf1 ebf2 double mutant (Fig- 
ure S2A). We found that M1U no longer affected the triple 
response phenotypes (Figure 2B), and the abundance of 
EIN3 protein was comparable between iEIN3/qm and M1U 
iEIN3/qm (Figure S2B). Together, these results demonstrate 
that the presence of EBF1I2 is required for the 7 L/-overexpres- 
sion-induced repression of ethylene responses, implying that 
exogenous 3' UTR expression modulates the function of 
EBF1/2. 

We found that the levels of both EBF1 and EBF2 tran- 
scripts were not evidently affected by 1U overexpression 
(Figure S2C), excluding the modulation of EBF1I2 at the level 
of transcription or RNA decay. We next examined whether 
the translation of EBF1I2 mRNAs is under the regulation. 
Without good antibody against EBF1 or EBF2 available, two 
experiments were conducted for this purpose. Using poly- 
some profiling assays, we found that the translation of 
EBF1 and EBF2 mRNAs was repressed by ethylene, as the 
portion of high-density polysome-associated EBF1I2 mRNAs 
was decreased upon ethylene application (Figure 2C). 
Notably, 1U overexpression recovered the drop of the portion 
of polysome-associated EBF1I2 mRNAs (Figure 2C). There- 
fore, 1U overexpression augments the translation of endoge- 
nous EBF1I2 mRNAs, which is subjected to repression by 
ethylene. 

Furthermore, we constructed transgenic plants harboring 
GFP-EBF1 followed by 7U or not (G1F and G7C, respectively) 
(Figure 2D), and expressed an inducible 1U (iEBFIU) in these 



plants to examine the effect of exogenous 1U expression on 
G1F or G7C translation. We found that, while GFP-EBF1 
mRNA levels were comparable, GFP-EBF1 protein levels were 
downregulated by ethylene and upregulated by 1U overexpres- 
sion in G1F plants (Figures 2E and 2F). By contrast, in G7C 
plants, the GFP-EBF1 protein levels were virtually unchanged 
upon 1U expression regardless of ethylene application (Figures 
2E and 2F). Collectively, these results suggest that the over- 
accumulation of 7U transcripts boosts the function of EBF1I2 
by enhancing their translation. 

Based on these observations, we propose a translational inter- 
ference model, in which ectopically expressed 7 U transcripts in- 
terferes with the endogenous EBF1/2 3' UTRs that supposedly 
exert a repressive role on the translation of EBF1I2 mRNAs. 
Such translational interference could arise from the competition 
and/or titration of translational repressors binding to the endog- 
enous 3' UTR regions (Figure 2G). 

The 3^ UTRs Impart Translational Inhibition to EBF1/2 
mRNAs in Response to Ethylene 

We next tested the translational interference model (Figure 2G) 
by examining the effect of EBF1 3' UTR on GFP mRNA trans- 
lation (Figures 1H, II, and S1F-S1J). We found that, with the 
comparable transcript levels (Figure S1G), seedlings express- 
ing G1U accumulated much lower GFP fluorescence or protein 
abundance than those expressing GFP alone, particularly when 
treated with ACC (Figures 3A-3D). Ethylene caused over 80% 
of decrease in the translational efficiency of G1U whereas 
had no effect on GFP alone (Figures 3C and 3D). The ACC-pro- 
moted reduction in GFP protein abundance was restored by 
the application of ethylene inhibitor silver ions (Ag"^) (Figure 3E). 
Taken together, these results indicate that EBF1 3' UTR confers 
translational repression to its fusion mRNA in response to 
ethylene. 

Next, we determined the biological significance of the EBF1 
mRNA 3' UTR-mediated translational repression in ethylene 
signal transduction. We constitutively expressed M1C (MYC- 
EBF1, MYC tag fused with the EBF1 coding sequence) and 
M1F (MYC fused with the EBF1 full-length transcript including 
coding sequence and 3' UTR) (Figure 3F). Compared with control 
plants, M7F expression resulted in reduced ethylene sensitivity, 
whereas M1C expression conferred nearly complete ethylene 
insensitivity (Figure 3G). In agreement with the triple response 
phenotype, the amount of MYC-EBF1 protein was nearly con- 
stant in M1C but progressively decreased in M1F upon treatment 
with increasing doses of ACC (Figure 3H). Given the comparable 
mRNA abundance between M1F and M1C (Figures S3A and 
S3B), we concluded that translational repression of EBF1 
mRNA via its 3' UTR is critical for EBF1 function in ethylene 
signaling. 

We further found that the overexpression of EBF2 3' UTR (2U) 
also led to reduced ethylene sensitivity in GFP-EBF2 3' UTR 
(G2U) transgenic plants (Figures S3C and S3D). Like EBF1 3' 
UTR, EBF2 3' UTR also conferred translational repression to 
the GFP mRNA fused with it (Figure S3E). Thus, the 3' UTRs of 
both EBF1 and EBF2 act similarly to impose translational repres- 
sion to their respective mRNAs in response to the ethylene 
signal. 
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Figure 2. Overexpression of EBF1 3' UTR Enhances the Translation of Endogenous EBF1/2 mRNAs 

(A and B) Triple response phenotypes of etiolated transgenic seedlings expressing G1U {GFP-EBF1 3' UTR) treated with ACC (A), and seedlings expressing M1U 
{MYC-EBF1 3' UTR) treated with ACC in combination with DMSO or p-estradiol (B). iEIN3/qm is the p-estradiol-induced EIN3-Flag in the ein3 eil1 ebf1 ebf2 
quadruple mutant background, which was used to substitute for the lethal ebf1 ebf2 double mutant (An et al., 2010). 

(legend continued on next page) 
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Figure 3. EBF1 3' UTR Confers Translational Repression to Its Fusion Transcripts in Response to Ethylene 

(A and B) GFP fluorescence in the roots of three independent transgenic seedlings expressing GFP or G1U {GFP-EBF1 3' UTR) with (+) or without (-) ACC 
treatment (A) and the relative quantifications of GFP fluorescence (B). ***p < 0.001 . Mean + SD, n > 20 roots. 

(C) Immunoblot assays showing GFP protein abundance in whole etiolated seedlings with (+) or without (-) ACC treatment. 

(D) qRT-PCR analysis of GFP mRNAs and quantification of GFP proteins in (C). The ratio of protein to mRNA abundance was defined as the translation efficiency. 
***p < 0.001 ; calculations based on three biological repeats. 

(E) Immunoblot assays showing GFP protein abundance in etiolated seedlings treated with (+) or without (-) ACC and/or silver ion. 

(F) Structures of MYC, M1C {MYC-EBF1 CDS), and M1F {MYC-EBF1 full length containing CDS and 3' UTR) transcripts. 

(G) Hypocotyl lengths of etiolated seedlings of three independent transgenic lines expressing MYC, M1C, and M1F. Mean ± SD, n > 20. 

(H) Immunoblot assays indicating MYC-EBF1 protein abundances (top) and their relative quantifications (bottom) in seedlings treated with increasing doses of 
ACC. Calculations were based on three biological repeats. 

See also Figure S3. 



EIN2 Is Essential for 3^-UTR-Mediated Translational 
Repression of EBF1 mRNA 

We next investigated the role of key ethylene signaling com- 
ponents in 3'-UTR-mediated translational regulation. The 
ethylene-induced repression of G1U mRNA translation, mani- 
fested by reduced GFP fluorescence, was similarly observed in 



Col-0 and einS eil1 , but not in ein2 and a receptor mutant etr1 
(Figures 4A and S4A), suggesting that the upstream signaling 
components including the receptors and EIN2 are required for 
3'-UTR-mediated translational repression, whereas EIN3/EIL1 
are not. Expression of a p-estradiol-inducible version of EIN2 
was sufficient to restore such translation inhibition in ein2, and 



(C) Polysome profiling assays with sucrose density gradient accompanied by qRT-PCR to analyze translational status of EBF1I2 mRNAs. A 254 absorption was 
monitored together with fractionation (left). The fractions containing 40S, 803 of ribosome, and polysomes are indicated. The abundance of EBF1 and EBF2 
mRNA in each fraction was detected by qRT-PCR and quantified as a percentage relative to their total amount (right). UBQ5 mRNA was used as a reference. 

(D) Structures of iEBFW (|3-estradiol-inducible EBF1 3' UTR) transcript, G1F {GFP-EBF1 full length containing CDS and 3' UTR) and G1C {GFP-EBF1 CDS). 
Arrows indicate the primer pair used to analyze the expression of iEBFIU. 

(E) Coexpression of G7F or G7C together with iEBFIU in etiolated seedlings treated with or without ethylene and p-estradiol for 4 hr before RT-PCR and western 
blotting analysis. Protein loading was manifested by Coomassie brilliant blue (CBB) staining. 

(F) Quantitative measurements of GFP-EBF1 proteins in (E) based on three biological repeats. *p < 0.05; ***p < 0.001 . 

(G) A translational interference model proposes that the exogenously overexpressed 3' UTRs enhance the translation of endogenous EBF1I2 mRNAs by 
competing with their inherent 3' UTRs and thus titrating unknown repressor X bound to 3' UTRs. 

See also Figure S2. 
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Figure 4. EIN2 Is Required for EBF1 3'-UTR-Mediated Translational Repression 

(A) GFP fluorescence in the roots of etiolated seedlings expressing G1U {GFP-EBF1 3' UTR) in different genotype backgrounds (top). Immunoblot assays showing 
GFP protein abundance in whole seedlings (bottom). 

(B) Structure of the (3-estradiol-inducible EIN2-HA gene {iEIN2-HA). 

(C) GFP fluorescence in the roots of etiolated seedlings transiently treated with or without ACC and (3-estradiol for 6 hr. “Removed,” removal of both ACC and 
p-estradiol. 

(D) Profiles of polysome-associated EBF1, EBF2, and UBQ5 mRNAs in Col-0 and ein2-5. 

(E) Immunoblot assays showing MYC-EBF1 protein abundance in etiolated seedlings of transgenic plants expressing M1F {MYC-EBF1 CDS+3'UTR) or M1C 
{MYC-EBF1 CDS). 

(F) Immunoblot assays showing MYC-EBF1 and EIN2-HA protein abundance in transgenic plants expressing iEIN2-HA together with M1F or M1C. Note that 
multiple processed C-terminal fragments of induced EIN2-HA (CEND-HA) were also shown. 

(G) Triple response phenotypes of etiolated seedlings corresponding to (F). 

(H) Quantitative measurements of hypocotyls (left) and roots (right) of etiolated seedlings in (G). *p < 0.05; **p < 0.01 ; ***p < 0.001 . Mean ± SD, n > 20. 

See also Figure S4. 
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the removal of p-estradiol led to the efficient translation of G1U 
again (Figures 4B and 4C). A similar scenario was observed 
with transiently expressed p-estradiol-inducible EIN2 and G1U 
in tobacco (Figure S4B), supporting that EIN2 is essential for 
EBF1 3' UTR-directed translational repression. 

To gain further evidence for EIN2-regulated EBF1I2 mRNA 
translation, we compared the polysome profiles of EBF1/2 
mRNAs between Col-0 and ein2 (Figure 4D). The polysome pro- 
files of EBF1I2 mRNAs remained virtually unchanged in ein2 
when treated with ethylene, in contrast with the apparent 
ethylene-induced polysome profile shifts observed in Col-0 (Fig- 
ures 2C and 4D). Meanwhile, we found that the ethylene-evoked 
translational repression of M1F (MYC-EBF1 full-length tran- 
script) was abolished in ein2 (Figures 4E and S4C), but exacer- 
bated by addition of EIN2 function (Figure 4F). By contrast, the 
translation of M1C (MYC-EBF1 CDS) remained unaffected 
upon depletion or addition of EIN2 (Figures 4E, 4F, and S4C). 
Furthermore, the partial ethylene-insensitivity phenotype of 
M1F transgenic plants was largely suppressed by the overex- 
pression of EIN2, whereas the strong ethylene insensitivity of 
M1C was hardly affected (Figures 4G and 4H). Taken together, 
these results indicate that 3' UTR is a critical ethylene-respon- 
sive element to repress EBF1 translation, and EIN2 is necessary 
and sufficient for directing such translational repression. 

E I N2- Directed Translational Repression Is Mediated 
by PolyU Motifs of EBF1/2 3^ UTRs 

We next dissected the functional c/s elements within the EBF1I2 
3' UTRs by utilizing a dual-construct translation analysis system 
in tobacco leaves, in which a 3' UTR fragment of interest was 
fused with the GFP coding region, together with mCherry as 
the internal control in the same reporter construct (Figures 5A 
and S5A). The GFP intensities relative to mCherry intensities 
were calculated to indicate the translation efficiency of GFP 
mRNA (Figure 5B). Whereas the translation of GFP alone was 
not altered by introduction of EIN2 and/or ACC application, the 
translation of G7U and G2U (GFP fused with EBF1/2 3' UTR, 
respectively) was remarkably repressed by either expression of 
EIN2 or ACC application (Figures S5B-S5D and S5K), and to a 
further extent when combining these two treatments (Fig- 
ure S5C). As a control, expression of EIN3 protein had no effect 
on the translation of G1U (Figures S5E-S5H). These results 
confirmed the inhibitory effect of EBF1I2 3' UTRs on translation 
in an EIN2-dependent manner. 

EBF1 3' UTR was arbitrarily segmented into five fragments 
ranging from 98 to 1 50 nt in length (Figure 5C). Three fragments, 
including 1Ua, 1Uban6 1Ud, were able to mediate EIN2-induced 
translational repression (Figure 5D). Using the computation algo- 
rithm MEME and RNAfold, we identified a total of 7 poly-uridy- 
lates motifs in the predicted stem-loop structure within these 
three fragments (Figure 5E). These sequences were designated 
as Ethylene Responsive RNA elements containing Poly-Uridy- 
lates (ERR-PolyU, or EPU for short) (Figure 5E). Deletion of 
EPUs in each fragment or all seven EPUs in 1U, which did not 
change their overall predicted secondary structures (Figure S5I), 
eliminated EIN2-directed translational repression (Figure 5F). 
Similarly, five EPUs were found in EBF2 3' UTR (Figure S5J), 
and they were all required for2L/to mediate EIN2-induced trans- 



lational inhibition (Figure S5K). Sequence alignment of EBF 3' 
UTRs from different plant species revealed that PolyU motifs 
are among the most conserved regions (Figures S5L and 
S5M), suggesting the 3'-UTR-mediated translational regulation 
as a well-preserved mechanism of ethylene signaling. 

To further investigate the role of EPUs in relaying ethylene 
signaling, we generated the transgenic plants expressing 
either the GFP-EBF1 full-length transcript driven by its own 
promoter (pEBF1::G1F) or seven EPL/s-depleted version 
(pEBF1::G1FA7U) in ebf1 mutant background. While expression 
of pEBF1::G1F rescued ebf1 to the wild-type level, the 
pEBF1::G1FA7U/ebf1 seedlings exhibited nearly complete 
ethylene insensitivity, phenocopying pEBF1::G1C/ebf1 plants 
(GFP-EBF1 CDS driven by its own promoter) (Figure 5G). Consis- 
tent with the ethylene-response phenotype, the levels of EIN3 
protein were much lower in both pEBF1 ::G1 FA7U/ebf1 and 
pEBF1::G1C/ebf1 than that in Col-0 or pEBF1::G1F/ebf1 , 
whereas the GFP-EBF1 protein was more abundant in the former 
two lines, particularly under ethylene treatment (Figure S5N). 
These results suggest that EPL/-mediated translational inhibition 
plays a key part in regulating EBF1 protein abundance as well as 
ethylene signal transduction. 

From 1 Ud, we selected a region harboring two EPUs that is pre- 
dicted to form a hairpin structure (Figure 5FI), and repeated it 
three times to construct an artificial 3' UTR that possessed six 
EPUs (6x EPU) (Figure 5H). Similar to G1U, the translation of 
GFP-Gx EPU mRNA was highly reduced upon EIN2 induction 
(Figure 5I). Furthermore, transgenic overexpression of GFP-6x 
EPU but not GFP- 1 UA 7U conferred ethylene insensitivity pheno- 
type (Figure S50). Together, these results demonstrate that EPUs 
mediate the EIN2-directed translational repression of EBF1I2, 
which represents a crucial mechanism of ethylene signaling. 

We also examined the functional domain of EIN2 in transla- 
tional repression. By taking advantage of the tobacco system, 
we narrowed down the C-terminal end of EIN2 fragments 
(CEND) to amino acids (aa) 654-1272 that were required for 
translational repression (Figures 5J, 5K, and S5P). Within this re- 
gion, a predicted nuclear localization signal (NLS, aa 1 262-1 269, 
LKRYKRRL) was previously identified to be required for the nu- 
clear translocation as well as the functionality of CEND (Ju 
et al., 2012; Qiao et al., 2012; Wen et al., 2012). We found that 
deletion or mutation of this NLS region also disrupted the func- 
tion of CEND in translational repression (Figures 5J and 5K). 
Interestingly, replacement of the NLS with a distinct K/R-rich 
NLS sequence (NLS’: KPKKKRKV) was able to relocate CEND 
into the nucleus but failed to restore its translational repression 
ability (Figures 5K and S6G). Together, these results suggested 
that the short motif (aa 1 262-1 269) was also critical for the trans- 
lational repression function of EIN2 independent of its being a 
nuclear localization signal. 

Association and Co-localization of EIN2 with EBF1 3' 

UTR in Cytoplasmic Foci 

We next investigated how EIN2 imposes translational repression 
of 7L//2L/-containing mRNAs. We first examined whether EIN2 
associates with 1U in vivo. RNA-immunoprecipitation assays 
(RNA-IP) in tobacco leaves indicated that EIN2 preferentially 
associated with mRNAs containing 1U (G1U, M1U), but not 
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Figure 5. PolyU Motifs in EBF1 3' UTR Are Necessary and Sufficient for EIN2-Directed Translational Inhibition 
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with GFP mRNA alone, and ethylene enhanced the association 
between EIN2 and G1U mRNA (Figures 6A and S6A). 

Next, we sought to examine the subcellular localization and 
dynamics of 7L/-containing mRNAs and EIN2. We adopted the 
MS2 system (Bertrand et al., 1998) to directly visualize the sub- 
cellular localization of 7L/-containing mRNAs. In this system, YFP 
was fused to the C terminus of MS2 coat protein (MY), and six 
tandem repeats of MS2 binding sites (6X MS2bs) were inserted 
into M1U to produce a reporter RNA MYC-6X MS2bs-1U 
(M6U), while a reporter RNA MYC-6X MS2bs (M6) served as a 
negative control (Figures 6B, S6B, and S6C). RNA-IP assay re- 
vealed the association of EIN2 and M6U in vivo (Figure S6A), 
and transgenic plants overexpressing M6U showed ethylene- 
insensitivity phenotypes (Figure S6D), demonstrating the func- 
tionality of this fusion RNA. In the absence of ethylene, M6U 
was observed to spread in the cytoplasm and concentrate in 
the nucleus, similar to the distribution pattern of M6 (Figure 6C). 
Notably, ethylene treatment specifically induced M6U but not 
M6 to form granules in the cytoplasm (Figures 6C and 6E). 

Meanwhile, we found that ethylene treatment can also induce 
a proportion of EIN2 to form cytoplasmic foci in addition to its nu- 
clear accumulation (Figures S6E and S6F; Movies SI and S2). In 
the presence of ethylene, a portion of EIN2 protein and M6U 
mRNA were co-localized in cytoplasmic foci (Figure 6D). Further- 
more, the cytoplasmic foci formation of M6U was abolished in 
ein2 (Figure 6E), suggesting the requirement of EIN2 for foci for- 
mation of 7L/-containing mRNA. Taken together, these results 
suggest that ethylene promotes the association of EIN2 with 
1U, which in turn is targeted to cytoplasmic foci. 

We further found that EIN2 CEND (aa 459-1 294) as well as the 
minimal functional fragment of EIN2 (aa 654-1 272) were also able 
to form cytoplasmic foci under ethylene treatment, whereas all the 
translation-dysfunctional fragments of EIN2, including aa 673- 
1294, deletion or mutation of NLS, failed to form foci in the cyto- 
plasm (Figure S6G). It is noteworthy that the addition of another 
functional NLS sequence (NLS’) could not restore the cytoplasmic 
foci formation of NLS-deleted or -mutated EIN2 (Figure S6G). 
Together with the observations made in Figure 5K, these results 
demonstrate that the NLS sequence of EIN2 is critical for its cyto- 
plasmic foci formation as well as translational regulation function. 

P-Body Is Involved in EBF1/2 3 -UTR-Mediated 
Translational Repression by EIN2 

Given that EBF1I2 RNAs are subjected to the regulation by EIN5, 
an exoribonuclease associated with processing body (P-body) 



(Decker and Parker, 201 2; Xu and Chua, 201 1 ), and that 7 L/-con- 
taining mRNA forms cytoplasmic foci, we next determined 
whether 1U directs its fusion mRNA to P-body. Upon ethylene 
treatment, both M6U mRNA and EIN2 protein were partly co- 
localized with EIN5 in cytoplasmic foci (Figures 7A and 7B), 
indicative of their P-body localization. Additionally, yeast-two- 
hybrid (Y2H) and luciferase complementation imaging (LCI) 
assay indicated that EIN2 CEND interacted with EIN5 (Figures 
7C and 7D). Co-immunoprecipitation (Co-IP) assays revealed 
that EIN5 associated with EIN2 mainly in the presence of RNA, 
as treatment with RNase largely diminished EIN5-EIN2 associa- 
tion (Figure 7E). Furthermore, we found that several other P-body 
components, such as PAB2, PAB4 and PAB8 (Decker and 
Parker, 2012), also interacted with EIN2 CEND in yeast and plant 
cells in a RNA-dependent manner (Figures 7C, 7D, and S7A- 
S7D). In keeping with these biochemical results, knockout mu- 
tants of P-body component genes, such as E/A/5, PAB2, 
PAB8, and UPF1 , exhibited reduced ethylene sensitivity mani- 
fested by compromised triple response phenotypes, target 
gene expression, and EIN3 protein accumulation (Figures 7F 
and S7E-S7G). The combinations of these mutants led to 
increasing severity of ethylene-insensitivity phenotypes, particu- 
larly for the ein5 upf1 pab2 pab8 quadruple mutant, which ex- 
hibited strong insensitivity to ethylene (Figures 7F and S7E). 
Therefore, several P-body components act cooperatively to 
repress the translation of mRNAs harboring 1U. 

Although UPF1 was not detected to physically interact with 
EIN2, we observed the binding of UPF1 to 1U (Figure S7H), 
consistent with its function as a non-selective RNA binding pro- 
tein (Hogg and Goff, 2010). The comparable mRNA levels of 
EBF1 and EBF2 between the P-body mutants and wild-type 
plants further supported a control of translation rather than tran- 
scription or RNA decay of EBF1I2 by P-body (Figure S7F). Taken 
together, we proposed that after activation by ethylene, EIN2 
CEND associates with the 3' UTR of EBF1/2 mRNAs and targets 
them to P-body via interacting with multiple P-body components, 
thus repressing the translation of EBF1 and EBF2, resulting in 
EIN3/EIL1 accumulation and ethylene responses (Figure 7G). 

DISCUSSION 

A Cytoplasmic Mode of EIN2 Action in Ethylene 
Signaling 

Recently, three groups have uncovered a “cleave and shuttle” 
mode of EIN2 action, wherein its C-terminal end (CEND) is 



EIN2-FIA protein. Translational inhibition was calculated by relative GFP intensity in the presence of EIN2-HA and ACC application (RGI of +EIN2) normalized with 
that without EIN2-HA and ACC (RGI of -EIN2) (B). 

(C and D) Fragments of 7L/ {EBF1 3' UTR) and their effects on the translation of GFP mRNA with or without EIN2 function. *p < 0.05, **p < 0.01 , ***p < 0.001 . 
Calculations of translational inhibition were based on three biological repeats and the value of GFP control was set as 1 . 

(E) The PolyU ethylene responsive RNA elements (termed as EPP-PolyU or EPU) shared in the fragments 1Ua, 1Ub, and 1Ud. 

(F) The effects of EPUs on the translational inhibition. AU, deletion of EPU. A7U, deletion of all seven EPUs in full-length 1U. ***p < 0.001 . 

(G) Triple response phenotype of etiolated seedlings in the presence of ACC. G1F{GFP-EBF1 full length containing CDS and 3' UTR), G1FA7U (G7Fwith all seven 
EPUs deleted), and G1C {GFP-EBF1 CDS) were all driven by native EBF1 promoter (pEBF1) and expressing in ebf1 mutant. 

(H and I) An engineered 6x EPU fragment and its effect on translational inhibition upon EIN2 activation. G and C bases were added to the two EPUs in 1Ud to 
produce a stem-loop structure, which was repeated three times to generate 6x EPU. 

(J and K) Scheme for different EIN2 fragments and their inhibitory effect on G1U {GFP-EBF1 3' UTP) translation. JNLS indicates the deletion of the predicted 
nuclear localization signal. NLS' represents a distinct NLS sequence. NLSmSA means the substitution of the NLS motif with eight alanine residues. ***p < 0.001 . 
See also Figure S5. 
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Figure 6. Ethylene Induces the Association and Co-localization of EBF1 3' UTR with EIN2 in Cytoplasmic Foci 

(A) RNA-IP assays indicating the association between EiN2 and G1U {GFP-EBF1 3' UTR) in tobacco ieaves. GFP acts as a negative control. {iEIN2-HA\ 
(3-estradiol-inducible EIN2-HA). 

(B) Schematic diagrams of the MS2/RNA-MS2bs system. MY means MS2 coat protein linked with YFP; M6 and M6U, MYC-6X MS2 binding site and MYC-6X 
MS2 binding site -EBF1 3' UTR, respectively. S in a circle, stop codon. 

(C) YFP fluorescence revealing the subcellular localization of M6 and M6U RNAs in tobacco leaves treated with or without ethylene. Arrows mark cytoplasmic foci, 
while triangles indicate nuclei. 

(D) Co-localization of EIN2-CFP and M6U in tobacco leaves upon ethylene treatment. 

(E) The subcellular localization of M6 or M6U in transgenic Arab/c/ops/s seedlings treated with or without ACC. Right panels are zoom-in images of the boxed areas 
in left. 

See also Figure S6. 
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processed and translocated into the nucleus to activate ethylene 
signaling (Ju et al., 2012; Qiao et al., 2012; Wen et al., 2012). 
Here, we report another mechanism of EIN2-mediated ethylene 
signaling, whereby EIN2 imposes translational repression of 
EBF1I2 mRNA in cytoplasmic P-body compartments. This cyto- 
plasmic mode of EIN2 action was revealed by several lines of 
evidence: (1) EIN2 and ethylene treatment inhibit the translation 
of EBF1/2 mRNAs. (2) EIN2 is both necessary and sufficient for 
the translational repression of 7L/-containing mRNAs. (3) EIN2 
is colocalized and associated with 1U. (4) 1U, EIN2 and the 
EIN5 are co-localized in P-bodies upon ethylene treatment. 
(5) EIN2 interacts with several P-body factors including EIN5, 
PAB2/4/8. (6) Mutations in P-body protein genes led to evident 
ethylene insensitivity, particularly in combinations. Together 
with previous studies, our discovery illustrates that EIN2 guaran- 
tees the accumulation of key transcription factors EIN3/EIL1 in 
response to ethylene through at least two parallel mechanisms 
(Figure 7G). The cytoplasmic function of EIN2 is critical for 
quickly shutting down the protein synthesis of EBF1I2, leading 
to rapid depletion of EBF1/2 proteins due to its proteasomal 
degradation (An et al., 2010). Meanwhile, a subset of CEND is 
translocated into the nucleus to further stabilize and/or activate 
EIN3/EIL1 directly or indirectly (Ji and Guo, 2013). 

It has been previously reported that ethylene application 
causes polysome prevalence during the ripening of pear and 
avocado fruits, suggesting a positive regulation of translation 
by ethylene (Drouet and Hartmann, 1979; Tucker and Laties, 
1984). In an accompanying study. Merchants et al. (2015) used 
a plant-optimized genome-wide ribosome footprinting tech- 
nique and successfully identified a group of mRNA targets that 
are upregulated or downregulated by ethylene at translational 
level. Of these targets, EBF1 and EBF2 are prominent as 
ethylene-repressed mRNAs that are dependent on EIN2 but 
not EIN3/EIL1 , as observed also in our study. Thus, the EIN2- 
dictated translational control represents an early signaling event 
that operates in the cytoplasm either in parallel with or prior to the 
nuclear signaling cascade. Interestingly, this research together 
with previous studies (Qiao et al., 2012; Wen et al., 2012) re- 
vealed that a predicted NLS motif in the very C terminus of 
EIN2 is essential for its functions in both cytoplasm and nucleus. 
Given the recent finding that NLS is also critical for the associa- 
tion between EIN2 and the ethylene receptor ETR1 on ER (Bisson 
and Groth, 2015), it remains to be addressed how such short 
motif is involved in seemingly distinctive subcellular signaling 
events. 

EBF1/2 3' UTRs Function as Critical Ethylene- 
Responsive and Signal-Relaying Elements 

In mammals, 3' UTRs targeted by microRNAs are critical for the 
regulation of proto-oncogenes and tumorgenesis (Mayr and Bar- 
tel, 2009). Recent efforts were taken to systematically analyze 
human 3' UTRs, and dozens of novel c/s-regulatory elements 
were identified that affect mRNA stability and translation (Oiko- 
nomou et al., 2014; Zhao et al., 2014). Our study revealed that 
the 3'-UTR-mediated translational repression of EBF1/2 is vital 
for relaying the ethylene signal in plants. The biological signifi- 
cance of this repression was demonstrated by the findings that 
deletion of EBF1 3' UTR or the EPU motifs greatly enhanced 



the translation of EBF1 mRNA and led to nearly complete 
ethylene insensitivity (Figures 4G and 5G). We further identified 
multiple PolyU motifs in the loop of predicted stem-loop struc- 
tures (EPUs) as functional c/s elements shared in EBF1 and 
EBF2 3' UTRs (Figure 5). Considering the conservation of EIN2 
from green algae to land plants (Ju et al., 201 5), and of PolyU mo- 
tifs in the EBF 3' UTR sequences from different plant species 
(Figures S5L and S5M), we believe that the 3'-UTR-mediated 
translational regulation might be an evolutionarily widespread 
mechanism of ethylene signaling. 

Furthermore, our study indicates that the ethylene-induced 
EBF7/2 translational repression is likely to be achieved by target- 
ing EBF1I2 transcripts into P-body in an EIN2-dependent 
manner. Although our initial in vitro pull-down assays failed to 
detect their direct binding, RNA IP experiment revealed the as- 
sociation of EIN2 with EBF1 3' UTR in vivo (Figure 6A). It raises 
the possibility that some unidentified RNA binding proteins, 
which could specifically recognize PolyU motifs of 3' UTRs, 
directly or indirectly interact with EIN2 and tether it to EBF1I2 
mRNAs (Figure 7G). EIN2, therefore, may act as a hormone-acti- 
vated switch to target EBF1I2 mRNAs (and probably other 
mRNAs as well) to P-bodies via interaction with P-body proteins. 

Cytoplasmic foci, including P-body and stress granules, have 
been observed in plant cells under myriad stress conditions 
(Maldonado-Bonilla, 2014). The importance of P-body in 
ethylene signaling was manifested as mutants of several 
P-body components led to reduced ethylene sensitivity (Fig- 
ure 7F). Therefore, ethylene, well known as a stress hormone, 
might adopt the translational repression mechanism via P- 
body to quickly shut down gene expression under adverse stress 
conditions. 

Utilizing the Translational Interference Effect of 3^ UTR 
to Modulate Gene Function 

Overexpression of the coding sequence of a gene had been 
widely utilized as a powerful genetic tool to study the gene func- 
tions in animals and plants (Prelich, 2012). In this study, we 
demonstrated that overexpression of 3' UTR could also result 
in remarkable interference with the function of their cognate 
genes as well as the signaling output. Several lines of evidence 
supported that the exogenous expression of 3' UTR leads to 
the enhancement or de-repression of the endogenous EBF1I2 
mRNA harboring the same or related 3' UTR in a frans-acting 
manner. Given the strong phenotype of 3'-L/TR-overexpressing 
transgenic plants, our study offers an alternative tool to study 
and regulate the function of genes in vivo. 

The translational interference effect of 3' UTR illustrated in this 
work is reminiscent of the action of microRNA sponges (Ebert 
et al., 2007) as well as competitive endogenous RNA (ceRNA) 
in mammals (Denzier et al., 2014; Salmena et al., 2011), and mi- 
croRNA target mimics in plants (Franco-Zorrilla et al., 2007), all of 
which share a common underlying mechanism referred to as 
molecular titration (Bosson et al., 2014; Buchler and Louis, 
2008). As such, the accumulation of 3' UTR fragments, as 
observed in einb (Souret et al., 2004), might hold biological 
importance, such as to coordinate or buffer the translational 
regulation of related mRNAs. In the future, a more systematic 
identification and study of 3' UTRs in plants and animals would 
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Figure 7. EIN2 Co-localizes with EBF1 3' UTR in P-Body and Interacts with Multiple P-Body Components 

(A) Co-localization of M6U {MYC-6X MS2 binding site-EBF1 3' UTR) (green) and EIN5 (red) in P-bodies of tobacco leaves. Arrows mark cytoplasmic foci (P- 
bodies), while triangles indicate nuclei. 

(B) A 3D image showing partial co-localization of EIN2 (red) and EIN5 (green) in P-bodies (arrow). 

(C) Yeast-2-hybrid assays indicating the interactions between EIN2 CEND (889-1294) and EIN5 as well as PAB2/8. 

(D) Luciferase complementation imaging (LCI) assays manifesting the interaction between EIN2 CEND and P-body components in Arabidopsis protoplasts. 
Combinations in the right list show strong interaction, while the others in the bottom box are either negative controls or exhibit no interaction. 

(E) Co-immunoprecipitation assays indicating the association between EIN2 and EIN5 in the presence of RNA. Immunoblot assays showing the amount of 
expressed proteins in tobacco leaf extracts (input) and after IP with anti-HA antibody. HA and CLuc were used as negative controls. Rl, RNase inhibitor. 

(legend continued on next page) 
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provide new information about gene functions as well as their 
regulatory mechanisms. 

EXPERIMENTAL PROCEDURES 
Arabidopsis Materials and Growth Conditions 

The ecotype Columbia (Col-0) was the parent line for all mutants and trans- 
genic plants used in this study. Transgenic lines in different genetic back- 
grounds were constructed by genetic crosses. Unless otherwise stated, all 
Arabidopsis seedlings were grown on MS medium supplied with or without 
10 laM ACC, or other chemicals, for 3-4 days. For transient treatments, 
100 |iM ACC or 10 ppm ethylene was used for seedlings, and 100 |iM ACC 
or 1 00 ppm ethylene was used for tobacco leaves. 

Polysome Profiling 

Arabidopsis polysomes were fractionated over sucrose gradients as 
described (Missra and von Arnim, 2014) with minor modifications. 3-day-old 
etiolated seedlings were treated with 1 0 ppm ethylene for 4 hr and then ground 
in liquid nitrogen followed by resuspension in polysome extraction buffer. 
Supernatant was loaded onto a 15%-60% sucrose gradient and spun in a 
Beckman SW40Ti rotor at 40,000 rpm for 4 hr at 4°C. We collected 1 2 fractions 
by a gradient fractionator. Total RNA in each fraction was isolated using TRI- 
ZOL reagent (Life Technologies) and then subjected to reverse transcription 
and real-time PCR analysis. 

RNA Immunoprecipitation 

4-week-old tobacco leaves were infected with the mixture of two agrobacte- 
rium strains. Two days after agroinfiltration, the tobacco leaves were treated 
with air or ethylene for 4 hr and subsequently collected to be ground in liquid 
nitrogen, and protein/RNA complexes were extracted using two volumes of IP 
buffer. After removal of insoluble debris by centrifugation, cell extracts were 
incubated with anti-HA antibody (Sigma) for 2 hr on ice with occasional gentle 
mixing. The anti-HA-decorated extracts were incubated with pre-washed pro- 
tein G agarose beads. The co-immunoprecipitated RNA was isolated by 
TRIZOL reagent (Life Technologies) and analyzed by qRT-PCR. 

Co-lmmunoprecipitation 

One-month-old tobacco leaves were infected with the mixture of three agro- 
bacterium strains. Protein samples prepared from tobacco leaves 48 hr after 
Agrobacterium-mediated infiltration were homogenized in ice-cold IP buffer 
with the volume ratio of 1/2. After centrifugation, lysates were supplemented 
extemporaneously with RNase inhibitor (Promega) or RNase (Promega) and 
then incubated for 2 hr at 4°C under gentle agitation in the presence of EZview 
anti-HA affinity gel (Sigma). Antibody-coupled agarose beads were washed 
and subsequently denatured to detect the IPed proteins using western blot. 

See Supplemental Experimental Procedures for details on the above- 
described materials and methods, as well as additional methods and 
procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, one table, and two movies and can be found with this article on- 
line at http://dx.doi.Org/10.1016/j.cell.2015.09.037. 
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SUMMARY 

The central role of translation in modulating gene ac- 
tivity has long been recognized, yet the systematic 
exploration of quantitative changes in translation at 
a genome-wide scale in response to a specific stim- 
ulus has only recently become technically feasible. 
Using the well-characterized signaling pathway of 
the phytohormone ethylene and plant-optimized 
genome-wide ribosome footprinting, we have uncov- 
ered a molecular mechanism linking this hormone’s 
perception to the activation of a gene-specific trans- 
lational control mechanism. Characterization of one 
of the targets of this translation regulatory machin- 
ery, the ethylene signaling component EBF2, indi- 
cates that the signaling molecule EIN2 and the 
nonsense-mediated decay proteins UPFs play a 
central role in this ethylene-induced translational 
response. Furthermore, the 3'UTR of EBF2 is suffi- 
cient to confer translational regulation and required 
for the proper activation of ethylene responses. 
These findings represent a mechanistic paradigm 
of gene-specific regulation of translation in response 
to a key growth regulator. 

INTRODUCTION 

The plant hormone ethylene plays a central role in coordinating 
the multitude of molecular processes underlying developmental 
programs and environmental responses critical for plant survival 
(Abeles et al., 1992). The plant’s response to ethylene is initiated 
by the binding of this hormone to its cognate receptors— in Ara- 
bidopsis, a small family of five proteins (ETR1, ETR2, ERS1, 
ERS2, and EIN4) with sequence similarity to the bacterial two- 
component histidine kinases (Bleecker et al., 1988; Hua and 
Meyerowitz, 1998). Although some specialization has been 
recognized for the receptors, they are all thought to function pri- 
marily by modulating the activity of the rapidly accelerated fibro- 
sarcoma (RAF)-like kinase CTR1 (Clark et al., 1998). Inactivation 
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of this kinase in the presence of ethylene results in a reduction in 
the phosphorylation levels of the endoplasmic-reticulum-local- 
ized transmembrane protein EIN2 and cleavage and transloca- 
tion of the unphosphorylated C terminus of EIN2 (EIN2C) to the 
nucleus (Ju et al., 2012; Qiao et al., 2012). Downstream of 
EIN2, two different responses have been characterized. On the 
one hand, there is a rapid inhibition of growth that takes place 
within minutes of exposure to the hormone and does not involve 
the key transcriptional regulators EIN3 and EIL1 (Binder et al., 
2004). On the other hand, there are many other, and possibly 
slower, changes induced by this hormone, including transcript 
level alterations in hundreds of genes that do require the function 
of these two transcriptional regulators (Binder et al., 2004; Chang 
et al., 201 3). In contrast with the lack of information on the molec- 
ular mechanism behind the fast growth-inhibition response, all 
EIN3/EIL1 -dependent responses are activated by the aforemen- 
tioned translocation of the unphosphorylated EIN2C to the nu- 
cleus. Preventing this translocation stops the activation of 
EIN3/EIL1 (Qiao et al., 2012). The F-box proteins ETP1/ETP2 
and EBF1/EBF2 control EIN2 and EIN3 protein abundance, 
respectively (Guo and Ecker, 2003; Potuschak et al., 2003; 
Qiao et al., 2009). Interestingly, EBF2 itself is a direct transcrip- 
tional target of EIN3 (Konishi and Yanagisawa, 2008), suggesting 
the existence of a feedback regulatory loop that quickly 
dampens EIN3 activity shortly after activating this signaling 
cascade. The critical importance of the EIN3 regulation by the 
EBFs is further substantiated by the observation that EBF2 pro- 
tein levels are also modulated by an unknown EIN2-dependent 
mechanism (He et al., 2011). Finally, a P-body-localized 5'-3' 
exoribonuclease EIN5 (also known as XRN4) has also been impli- 
cated in the regulation of the EBF2 activity (Olmedo et al., 2006; 
Potuschak et al., 2006; Souret et al., 2004; Weber et al., 2008). 

Using a plant-optimized ribosome footprinting approach, we 
show that ethylene affects translation of several genes, among 
them the EBFs. The translational regulation of EBF2 is mediated 
by its long 3' UTR and requires the activity of the ethylene 
signaling components EIN2 and EIN5 and the nonsense-medi- 
ated decay proteins UPFs, but not that of the ethylene transcrip- 
tional master regulators EIN3/EIL1 . EIN2C can interact with the 
3'UTR of EBF2 and localizes to P-bodies. These findings not 
only provide direct evidence for the translation regulation of 
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specific genes in response to this hormone but also the concep- 
tual framework to decipher the molecular mechanism of a previ- 
ously proposed branch of ethylene signaling. 

RESULTS 

Ribosome Footprinting Unveils a New Translation- 
Based Branch of the Ethylene Response 

To probe the effects of ethylene on translation at the whole- 
genome level, we implemented the ribosome footprinting tech- 
nology, Ribo-seq, which allows for capturing the ribosomal 
load of expressed genes in the genome at a single-codon reso- 
lution (Ingolia et al., 2009). Using Ribo-seq, we looked for 
ethylene-triggered changes in translation rates that could not 
be explained by changes in transcript levels. 

Total mRNA and ribosome footprint analyses were carried out 
in parallel to identify changes in translation efficiency in response 
to ethylene (Figure S1A) (see the Supplemental Experimental 
Procedures). A 4 hr ethylene treatment was selected to capture 
robust early responses and to avoid secondary long-term effects 
of this hormone. The high quality of the Ribo-seq data (Ingolia 
et al., 2009) is evidenced by the abrupt appearance of a footprint 
signal 14-15 nt upstream of the start codon, a rapid decline in 
signal around 14-15 nt upstream of the stop codon, low density 
of footprints in the 5' and 3' UTR, and a strong 3 nt periodicity 
(Figures S1B-S1D), which represents the codon-long step- 
wise movement of the ribosome along the mRNA. None of these 
features were observed in the RNA sequencing (RNA-seq) li- 
braries (Figures SI B-S1 D). 

Ethylene induced global mRNA level changes (Figure SI E and 
Table SI) that were followed by concomitant alterations in the 
levels of translation (Figures S1E and S1F and Table SI). How- 
ever, in agreement with previous comparisons between protein 
and RNA levels (de Godoy et al., 2008; Ingolia et al., 2009), the 
correlation between the changes in transcript accumulation 
and translation levels was relatively poor, with an r^ value of 
0.22 (Figure SI F), suggesting the existence of a layer of regula- 
tion at the translational level. In fact, we identified several mRNAs 
affected by ethylene in their translational efficiency (Table SI). 
Importantly, two key ethylene signaling genes, EBF1 and 
EBF2, were found in this list of translationally regulated genes 
(Table SI). EBF1 and EBF2 encode F-box proteins involved in 
the degradation of EIN3/EIL1 in the absence of ethylene. In prior 
studies, the EBF protein levels have been shown to decrease af- 
ter ethylene treatment (Guo and Ecker, 2003; Potuschak et al., 
2003), although the transcript levels of at least EBF2 are known 



to increase in response to this hormone (Konishi and Yanagi- 
sawa, 2008). After 4 hr of exposure to ethylene, and coinciding 
with previous reports, we observed an ~1 .5-fold increase in 
the EBF2 mRNA, yet a surprising 2.8-fold decrease in its transla- 
tion efficiency (TE) (Figure 1A and Table SI). Likewise, we 
observed a reduction in the TE of EBF1 (Figure 1 B) and several 
other genes (Figure S2 and Table SI). These ethylene effects 
on the translation of EBF1 and EBF2 were further supported by 
the reduction of the relative levels of these mRNAs in the heavy 
fractions of a polysome profile (Figures 1C and ID). To further 
validate these findings, the ethylene effects on TE of six selected 
genes, including EBF1 , EBF2, and a negative control, RTE1 , 
were evaluated by calculating the ratio between the expression 
level of these genes in polysomal and total mRNA (Figures IE 
and S2D). Although this approach is not as sensitive at detecting 
changes in the ribosomal load of an mRNA as are Ribo-seq or 
ribosome profiling, it can accurately quantify alterations in the ra- 
tio of the mRNA subpopulations that are actively engaged in 
translation versus those populations that are non-translating. 
The TE of EBF1 and EBF2 in ethylene decreased nearly to half 
of that in air (Figure 1 E), confirming the results of Ribo-seq (Fig- 
ures 1A and IB) and polysome profiling (Figures 1C and ID). 
Similarly, the TE of the other three selected genes was also 
repressed by ethylene, whereas no effect was detected for 
RTE1 , a transcriptionally induced negative control (Figure S2D 
and Table SI). Together, these results suggest that the multitude 
of responses triggered by the hormone ethylene is the result of 
regulation of gene expression not only at the transcriptional level 
as shown previously (Chang et al., 2013) but also at the transla- 
tional level. These changes in translation are likely due to shifts in 
the equilibrium of translated and non-translated populations of 
target mRNAs rather than quantitative alterations in the transla- 
tion rates of individual transcripts. These findings also reveal 
that, as in the case of the transcriptional regulation, some of 
the components of the ethylene signal transduction pathway 
are themselves subject to ethylene-triggered translational regu- 
lation, raising the possibility of intricate feedback regulatory 
loops functioning in this signaling pathway. 

The 3^UTR of EBF2 Is Sufficient to Confer Ethylene- 
Mediated Regulation of Translation and Is Required for 
Proper Plant Responses to This Hormone 

Since EBF2 is a key negative regulator of ethylene signaling (Guo 
and Ecker, 2003; Potuschak et al., 2003), we reasoned that the 
observed translational repression of this gene may have a 
significant physiological effect. Although translation regulatory 



Figure 1. Translation of EBF2 and EBF1 Is Quickly Downregulated by Ethylene 

(A and B) Normalized distribution of RNA-seq and Ribo-seq reads in air and in ethylene along the EBF2 (A) and EBF1 (B) genes. 

5' UTR, coding DNA sequence (CDS), 3' UTR, and introns are marked as white, black, and gray boxes and a line, respectively. The fold change and the associated 
false discovery rate (FDR) for the ethylene effect on transcript and footprint levels, as well as the fold change in the footprint levels given the levels of mRNA (TE) 
and the corresponding FDR, are shown. 

(C) 10%-50% sucrose gradient absorbance (A 254 ) profiles of ribosome complexes obtained from Arabidopsis seedlings grown in air and/or 4 hr ethylene. 

(D) Polysomal distribution of ESF transcripts in air and 4 hr ethylene, a-h correspond to fractions 4+5 through 18+19 shown in (C) pooled in pairs. EBF mRNA 
levels were normalized against At4g34270. EBF expression in each polysomal fraction was calculated as the percentage of its expression in total RNA. 

(E) TE of the EBF2 and EBF1 mRNAs, calculated as their relative expression in polysomal/total RNA fractions, in seedlings grown in air or treated with 1 0 ppm of 
ethylene for the last 4 hr of the experiment. Expression levels of EBFs were normalized against At4g34270. (a) indicates a significant difference of the ethylene 
effect on the EBF TE (t test, p < 0.05). Bars represent means ± SEM for three biological replicates. 

3-day-old etiolated seedlings were used in all of the experiments. 
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elements can be located in both 5' and 3' UTRs (Szostak and Ge- 
bauer, 2013), we decided to investigate the potential regulatory 
role of the atypically large 590-bp-long 3' UTR of EBF2 
(3'EBF2) first, as it has previously been implicated in modulating 
the activity of this gene and prior efforts to determine the mech- 
anism of such regulation via changes in EBF2 mRNA stability 
were not conclusive (Potuschak et al., 2006). GFP reporter was 
fused to either 3'EBF2 or the NOS terminator and placed under 
the control of the constitutive 35S promoter (Figure 2A). The ef- 
fect of ethylene on the GFP fluorescence of stably transformed 
wild-type plants was examined under the standard ethylene tri- 
ple response assay conditions. While the transgenic lines with 



Figure 2. The 3EBF2 Is Sufficient to Confer 
Ethylene-Mediated Regulation of Translation 

(A and B) (A) Hypocotyl fluorescence and (B) its 
quantification (n = 15) in 3-day-old etiolated seedlings 
grown in the presence (ACC) or absence (MS) of the 
ethylene precursor ACC and harboring either the 
35S:GFP-NOS or the 35S:GFP-3' EBF2 constructs as 
depicted on top of the photos. GFP fluorescence is 
expressed as the % of fluorescence in ACC compared 
to that in MS controls. Bars represent means ± SD. a 
and b indicate a significant effect of the ethylene 
treatment on the levels of fluorescence (t test, 
p < 0.005 and p < 0.0001 , respectively). 

(C) Anti-GFP western blot of total protein extracts from 
the transgenic lines shown in (A). 

(D) Relative expression of GFP mRNA from two 
selected lines from (A). Bars represent means ± SEM 
for three biological replicates. Expression levels of 
the EBF2 transgenes were normalized against 
At5g44200. 

3-day-old etiolated seedlings were used in all of the 
experiments. 



the NOS terminator alone were equally fluo- 
rescent in control media and in media sup- 
plemented with the ethylene precursor 
ACC, the 3'EBF2 lines showed a strong 
reduction in the levels of fluorescence in 
ACC (Figures 2A and 2B). Western blot with 
an anti-GFP antibody confirmed that the 
observed decrease in fluorescence in the 
latter was due to a reduction in the amount 
of GFP protein (Figure 2C), whereas the 
NOS terminator line showed equal amounts 
of GFP protein in the presence and absence 
of ACC. As expected, we observed a 
range of GFP protein levels in different trans- 
genic lines, and in general, lower levels were 
found in the 3^EBF2 lines even in the 
absence of exogenous ethylene, observa- 
tions that likely resulted from positional ef- 
fects and endogenous ethylene. To further 
demonstrate that this ethylene effect in the 
3^EBF2 lines was due to changes at the level 
of protein translation rather than transcrip- 
tion or mRNA stability, the GFP mRNA was 
quantified by qRT-PCR. The differences in 
the GFP protein accumulation could not be explained by an 
ethylene-mediated effect on the mRNA levels (Figure 2D), which 
is consistent with previous reports that did not detect an effect of 
ethylene on the mRNA stability of EBFs (Potuschak et al., 2006). 

To better understand the role of 3^EBF2 in mediating the 
observed ethylene effect on translation, we took a complemen- 
tary approach utilizing previously generated transgenic lines 
(Konishi and Yanagisawa, 2008). In these lines, the ebf2 mutant 
is complemented with either a native genomic construct of EBF2 
or a similar construct in which 3^EBF2 was replaced by the NOS 
terminator (Figure 3A). Transgenic lines complemented with the 
native genomic construct showed a clear reduction in the TE of 
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Figure 3. The 3 EBF2 Is Required for the Proper 
Translation of EBF2 and Plant Response to 
Ethylene 

(A) TE of the EBF2 mRNA, calculated as the relative 
expression in polysomal/total RNA fractions, in ebf2 
seedlings complemented with EBF2p:EBF2-3' EBF2 and 
EBF2p:EBF2-NOS constructs grown in air or treated with 
10 ppm of ethylene for the last 4 hr of the experiment, 
(a) indicates a significant difference of the ethylene effect 
in the different genotypes (ANOVA, p < 0.01). Bars 
represent means ± SEM for three biological replicates. 
Expression levels of the EBF2 transgenes were normal- 
ized against Af4g34270. 

(B) Representative images of 3-day-old etiolated seed- 
lings of the indicated genotypes grown in control media 
(MS) or in the presence of 10 ^iM ACC (ACC). 

(C) EBF2 mRNA expression levels in 3-day-old etiolated 
seedlings of the same genotypes as shown in (B) 
normalized against Af5g44200 and expressed relative to 
the average of Col-0 plants. 

Error bars represent means + SEM for three technical 
replicates. 



EBF2 (Figure 3A), equivalent to that of the native EBF2 in wild- 
type plants (Figure 1). However, ethylene had no effect on the 
TE of EBF2 in the A/OS terminator lines (Figure 3A). Taken 
together, these results demonstrate that 3^EBF2 is sufficient to 
confer ethylene-mediated translational regulation. 

We re-examined the ethylene response of the aforementioned 
ebf2 lines expressing either the native EBF2 genomic construct 



or the corresponding control with the 3'EBF2 
replaced by the NOS terminator (EBF2-NOS 
lines). As previously reported (Konishi and Ya- 
nagisawa, 2008), the lines with the native 
3^EBF2 showed normal ethylene response in 
the classical triple response assay, whereas 
the NOS terminator lines showed strong 
ethylene insensitivity (Figure S3A). Although 
previously these phenotypes were attributed 
to the slightly elevated levels of EBF2 mRNA 
in the NOS terminator lines (Konishi and Yana- 
gisawa, 2008), our results suggested that the 
ethylene insensitivity of these lines could also 
be due to the loss of the 3' UTR-mediated 
translational repression of EBF2 by ethylene 
(Figure 3A). To distinguish between these two 
possibilities, we generated additional trans- 
genic lines in a wild-type genetic background 
using either the native EBF2 or the EBF2- 
NOS terminator constructs. As shown in Fig- 
ures 3B and 3C, no correlation between the 
ethylene phenotype and the levels of EBF2 
mRNA was observed (Figures 3B and 3C). 
The biological significance of the regulatory 
role of 3'EBF2 was further supported by the 
observation that plants expressing GFP- 
3'EBF2 under the strong 35S promoter dis- 
played mild ethylene insensitivity (Figure S3B). 
These results suggest that the presence of 
high levels of3'EBF2 can interfere with the molecular machinery 
responsible for the translational repression of the endogenous 
EBF2 mRNA. Taken together, the findings described above 
strongly support the idea that the translational regulation 
conferred by 3'EBF2 is critical for the proper function of the 
ethylene signaling pathway and the plant response to this 
hormone. 
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Figure 4. EBF2 Displays Complex Transcriptional and Translational 
Dynamics in Response to Ethylene 

(A) Relative expression levels of EBF2 mRNA in total and polysomal RNA 
fractions from 3-day-old etiolated Col-0 seedlings during a time-course 
treatment using 10 ppm of ethylene, (a) indicates significant difference be- 
tween that time point and the corresponding “Air” control (t test, p < 0.05). 
Bars represent means ± SEM for three biological replicates. Expression levels 
of EBF2 were normalized against At4g34270. 

(B) Anti-GFP western blot in 35S:GFP-3'EBF2 of total protein extracts from 
3-day-old etiolated seedlings during a time-course ethylene withdrawal 
experiment. 



The Dynamics of Transcriptional and Translational 
Regulation of EBF2 Shed New Light on Molecular 
Mechanisms of the Ethylene Response Kinetics 

Our results regarding the opposite effects of a short ethylene 
exposure on EBF2 expression at the transcriptional and transla- 
tional level, together with the known role oiEBF2 in the control of 
EIN3 activity, suggest existence of a regulatory mechanism 
involved in the dynamic aspects of the ethylene response. To 
investigate the role of transcriptional and translational regulation 
of EBF2 in the observed kinetics of the ethylene response, we 
examined by qRT-PCR the levels of EBF2 mRNA in both total 



and polysomal RNA fractions at different times after initiating 
the ethylene treatment. In agreement with previous reports 
(Chang et al., 201 3), the levels of EBF2 mRNA quickly increased, 
reaching the highest expression in the total RNA sample 4 hr 
after the beginning of the ethylene treatment, staying high at 
the 1 2 hr time point and slowly decreasing thereafter (Figure 4A). 
In contrast, polysomal EBF2 mRNA remained low for the first 4 hr 
(Figure 4A), despite the high levels of total EBF2 mRNA. Lack of 
efficient translation of EBF2 (a negative regulator of ethylene re- 
sponses that targets EIN3 for degradation) thus allows for a full- 
scale ethylene response in early stages of exposure of plants to 
ethylene. Interestingly, this period of low EBF2 accumulation co- 
incides with the previously reported maximum in EIN3 activity 
(Chang et al., 2013). Only after a prolonged ethylene exposure 
(1 2 hr to 24 hr) did we observe an increase in EBF2 mRNA accu- 
mulation in the polysomal fraction (Figure 4A), which correlates 
with the previously described decrease in EIN3 activity (Chang 
et al., 2013) and coincides with a parallel decline in the total 
mRNA levels of EBF2 (Figure 4A) (Chang et al., 2013). Thus, 
the attenuation of the ethylene response under continuous expo- 
sure to this hormone is preceded by an increase in the mRNA 
levels of EBF2 in the polysomal fraction (Figure 4A), suggesting 
that the dynamic balance between transcriptional and transla- 
tional activity of EBF2 plays a critical role in diminishing the 
ethylene response upon prolonged exposure to the hormone. 

To examine the reversibility of the ethylene effect on transla- 
tion, we performed a time-course recovery experiment using 
the p35S:GFP-3' EBF2 lines (Figure 4B) that can monitor the 
ethylene effect specifically on translation — i.e., in the absence 
of transcriptional regulation. We compared the accumulation of 
the GFP protein in seedlings grown in air, exposed to ethylene 
for the entire duration of the experiment (72 hr), or exposed to 
ethylene for 71 , 70, or 68 hr and then allowed to recover in the 
absence of the hormone for the last 1 , 2, or 4 hr of the total 
72-hr-long experiment, respectively (Figure 4B). As shown in Fig- 
ure 4B, in spite of the attenuation process described above, 
ethylene was able to nearly completely suppress the translation 
of the 3'EBF2-conXa\n\ng mRNA expressed under a strong 
constitutive promoter even after 72 hr of constant exposure to 
the hormone. Importantly, the protein levels of GFP rapidly 
increased after ethylene was removed, reaching maximum levels 
just 2 hr after the withdrawal of ethylene (Figure 4B). These re- 
sults support the idea that the translation regulation conferred 
by 3'EBF2 plays a role in re-establishing homeostasis upon 
removal of ethylene. Importantly, the analysis of the ebf2 mutant 
has previously implicated this gene in the resumption of growth 
after ethylene withdrawal (Binder et al., 2007), further supporting 
the physiological significance of the observed translation dy- 
namics of this gene. 

The Ethylene-Triggered Regulation of Translation of 
EBF2 mRNA Is EIN2 Dependent but EIN3/EIL1 
Independent 

To determine which canonical components of this hormone 
signaling pathway are required to mediate the translational regu- 
lation of EBF2 mRNA, we examined the expression of the 
35S:GFP-3' EBF2 construct in the strong ethylene signaling mu- 
tants ein2-5 and ein3-1 eil1-1 (Figure 5). The GFP fluorescence of 
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Figure 5. The Ethylene-Dependent Regula- 
tion of Translation of EBF2 and EBF1 Is 
EIN2 Dependent but EIN3/EIL1 Independent 

(A-D) (A) Hypocotyl fluorescence, (B) its quantifi- 
cation, and (C and D) TE of the endogenous 
EBF2 (C) and EBF1 (D) mRNA in 3-day-old etio- 
lated Col-0, ein2-5, and ein3-1 eiH-1 seedlings 
harboring 35S:GFP-3' EBF2 and grown in 5 mg/I 
silver, air, or in 10 laM ACC. 

(B and C) GFP fluorescence (B) was quantified 
across multiple seedlings (n = 7) and expressed 
as the percentage of fluorescence in ACC 
compared to that in silver controls. Error bars in (B) 
represent means ± SD. (a) indicates a significant 
effect of the ethylene treatment on the levels of 
fluorescence (one-way ANOVA, p < 0.0001). TE 
was calculated as the relative expression in poly- 
somal/total RNA fractions. Col graphs are the 
same as those shown in Figure 1, plotted here 
again to facilitate their comparison with the mu- 
tants. (a) indicates a significant difference of 
the ethylene effect on the EBF2 mRNA TE (t test, 
p < 0.05). Error bars in (C) and (D) represent 
means ± SEM for three technical replicates. 
Expression levels of the EBF transgenes were 
normalized against At4g34270. 
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wild-type, ein2, and ein3 eil1 seedlings homozygous for the 
transgene was examined (Figures 5A and 5B). A silver-treated 
control was included to mitigate the effect of elevated endoge- 
nous levels of ethylene in ein2 and ein3 mutants (Guzman and 
Ecker, 1990; Vandenbussche et al., 2012). While ethylene had 
a dramatic effect on the levels of GFP fluorescence in the wild- 
type plants, we did not observe any changes in the GFP intensity 
in the ein2 plants treated either with silver or with the ethylene 
precursor ACC (Figures 5A and 5B). Surprisingly, the levels of 
fluorescence in the ein3 eil1 double mutant were clearly affected 
by ethylene (Figures 5A and 5B). GFP fluorescence in this mutant 
was high in silver (where the effect of endogenous ethylene was 
suppressed) but dramatically decreased in plants grown in 
ACC-supplemented or un-supplemented media (where the 
high levels of endogenous ethylene were sufficient to trigger a 



full response). These results indicate 
that, while the function of EIN2 is required 
for the 3'EBF2-mediated ethylene-trig- 
gered regulation of translation, EIN3/EIL1 
are not. 

Next, we investigated the effects of the 
ein2 and ein3 eil1 mutants on the TE of the 
endogenous EBF2 and EBF1 in plants 
grown in silver- or ACC-supplemented 
media (Figures 5C and 5D). In full agree- 
ment with the results obtained for the 
GFP fluorescence of the 35S:GFP- 
3'EBF2 reporter construct, the TE of 
EBFs did not significantly change in the 
presence of silver or ACC in the ein2 
mutant, but a robust reduction in TE typi- 
cally observed in wild-type plants in 
response to ethylene was also seen in the ein3 eil1 mutant (Fig- 
ures 5C and 5D). 

To determine the molecular mechanism by which EIN2 medi- 
ates translational repression of the EBF2 mRNA, both a miRNA- 
based and a protein/RNA interaction-based mechanisms were 
considered. A negative outcome of prior efforts to elucidate 
the possible regulation of EBF2 by miRNA (Souret et al., 2004), 
together with the lack of known or predicted miRNA in the Arabi- 
dopsis genome likely to target EBF2 (Alves et al., 2009), made us 
disfavor the miRNA possibility. Nevertheless, we decided to 
examine the ethylene response of the strong small RNA biogen- 
esis mutant dcl2-1 dcl3-1 dcl4-2 (Flenderson et al., 2006). 
Consistent with the idea that small RNAs are not involved in 
the regulation of translation of EBF2, the mutant displayed 
wild-type level of ethylene sensitivity in the standard triple 
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Figure 6. The EIN2C Interacts with the 3 EBF2 mRNA and Localizes to P-Bodies 

(A) Yeast three-hybrid assay of the interaction between 3EBF2 and EiN2C. Activity of the reporter genes for interaction between the RNA bait and the protein prey 
is shown (HiS3 activity [growth] on His- media, ieft, and (3-gaiactosidase [biue coior] in X-gai, right). Aii yeast strains empioyed harbor the DMA binding domain of 

(legend continued on next page) 
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response assay (Figure S3C). These results are in agreement 
with the previous observation that hen1 , rdr2, dcl2, dcl3, sde1 , 
and c/c/4 mutants are not impaired in their response to ethylene 
(Potuschak et al., 2006). 

Since EIN2 is the most downstream known signaling compo- 
nent required for the translational regulation of EBF2, we decided 
to examine the interaction between the signal transducer— i.e., 
EIN2C (Alonso et al., 1999)-and the 3'EBF2 mRNA. Using the 
yeast three-hybrid system (SenGupta et al., 1996), we were 
able to detect interaction of EIN2C with two different RNA hy- 
brids of 3'EBF2, but not with the antisense version of this 
3' UTR (Figure 6A). Next, we investigated if EIN2C from plant ex- 
tracts could bind its target EBF2 mRNA in vitro. Using Nicotiana 
benthamiana, we transiently expressed 35S:GFP-EIN2C or the 
negative control 35S:GFP-PDC2 and measured the capacity of 
the tagged proteins to bind to the in-vitro-transcribed 3'EBF2 
RNA in an RNA-immunoprecipitation assay (Figure 6B). While 
we could detect the RNA for 3^EBF2 in the samples with 
EIN2C, we were not able to detect its presence in the PDC2 con- 
trol samples. These results suggest that EIN2C could bind to 
mRNAs in the cytoplasm and regulate their translation. An 
obvious implication of this mechanistic model is that EIN2C 
should be localized not only in the nucleus, as previously re- 
ported (Qiao et al., 201 2), but also in the cytosol. To test this pos- 
sibility, we reexamined the subcellular localization of transiently 
expressed GFP- or mCherry-tagged EIN2C in Arab/c/ops/s proto- 
plasts and/or tobacco leaves. As reported previously, EIN2C 
was nuclear localized (Figures S4A and S4B). We reasoned 
that, perhaps, under standard conditions, only a small fraction 
of EIN2C and/or only transiently is localized in the cytosol. Two 
different approaches were used to enhance the activity of 
EIN2C in the cytosol. First, we examined the subcellular localiza- 
tion of (1) GFP-tagged EIN2C in tobacco leaves co-transfected 
with a construct expressing mCherry-3’ EBF2 under the strong 
35S promoter (Figure S4B) and of (2) mCherry-tagged EIN2C in 
Arabidopsis protoplasts co-transfected with a construct ex- 
pressing CFP-MBS-3'EBF2 under the strong 35S promoter (Fig- 
ure S4C). While, in Arabidopsis protoplasts, we were not able to 
consistently detect a significant alteration in the EIN2C subcellu- 
lar distribution, with EIN2C detected mainly in the nucleus (Fig- 
ure S4C), in tobacco leaves, EIN2C was consistently localized 
both in the nucleus and in the cytoplasm where it formed distinct 
fluorescent foci (Figure S4B). Next, we examined the subcellular 
localization of mCherry- or GFP-tagged EIN2C in protoplasts ob- 
tained from the Arab/c/ops/s ein5-1 mutant known to accumulate 
high levels of 3^EBF2 (Potuschak et al., 2006). As shown in Fig- 
ure 6C, the subcellular distribution of tagged EIN2C in ein5 



dramatically shifted from nuclear to dual nuclear/cytoplasmic 
localization. As in tobacco, EIN2C in ein5 was not uniformly 
distributed in the cytosol but rather formed punctate aggregates. 
In contrast, localization of the GFP fusion protein expressed from 
the control 35S:GUS-GFP construct was not affected by the ein5 
mutation (Figures 6C and S4D). The EIN2C aggregates were 
found to correspond to P-bodies by co-localization experiments 
between EIN2C-mCherry and the P-body markers TZF1-GFP 
and DCP2-GFP (Goeres et al., 2007; Pomeranz et al., 201 0) (Fig- 
ure 6C). Furthermore, 3'EBF2 localized to similar cytoplasmic 
granules both in ein5-1 protoplasts (Figure 6C) and wild-type 
tobacco leaves (Figure S4E). These results not only support 
the idea that EIN2C is part of an RNA-protein complex localized 
to the P-bodies but also suggest that the ethylene defects of ein5 
could be the consequence of an overload of the translation regu- 
lation machinery similar to what is observed in plants overex- 
pressing GFP-3'EBF2 under the strong 35S promoter (Figure S5). 
Alternatively, or perhaps in addition to this effect, the ethylene 
insensitivity of ein5 could also be caused by the disruption of 
the normal trafficking of EIN2C to the nucleus. Consistent with 
the idea that, in ein5 the translation regulatory machinery 
involved in the control of EBF2 mRNA translation is overloaded 
and, therefore, defective, we observed that, in the ein5 mutant, 
the effect of ethylene on the GFP fluorescence level of 
35S:GFP-3' EBF2 and on the TE of the endogenous EBF2 
mRNA was significantly reduced compared with the responses 
observed in the corresponding wild-type controls (Figure S5). 

Having established that the ethylene responsive element 
mediating the EIN2-dependent translation regulation is located 
in the 3'UTRs of EBF2 and EBF1 mRNAs, we searched for a 
conserved sequence motif. Using the MEME motif finder (Bailey 
and Elkan, 1994), a conserved motif present multiple times in 
these two genes was identified (Figure 6D). Importantly, using 
the AME package (McLeay and Bailey, 2010), this motif was 
shown to be significantly enriched (p value = 2.63e-3) among 
the 3' UTRs of the genes translationally regulated by ethylene 
(Table SI). 

Translation Regulation and Ethylene Responses Are 
Disrupted in the upf2 Mutants 

In a parallel approach to identify additional genes involved in the 
ethylene response, five mutants were found and ordered in three 
complementation groups (Figures 7A, and S6A, and S6B). Map- 
based cloning and identification of additional insertional alleles 
showed that the causal mutations resided in the three core com- 
ponents of the nonsense-mediated RNA decay machinery, 
UPF1, UPF2, and UPF3 (Figure S6B and Table S3). Based on 



LexA fused to the bacteriophage MS2 coat protein (MCP). The additional constructs specific for each strain are the positive control prey protein (AD-IRP), the 
positive control RNA bait {IRE-MBS), the GAL4-activation domain fused to EIN2C (AD-EIN2C), and three different 3'EBF2 RNA baits with the MS2 binding site 
{MBS) fused in the sense orientation to the 5' {MBS-3'EBF2) and 3' {3'EBF2-MBS) ends or in the antisense orientation {MBS-as3'EBF2). NG indicates no growth. 

(B) RNA immunoprecipitation of GFP-EIN2C and the control protein GFP-PDC2 purified from transfected tobacco leaves and incubated with the in-vitro-tran- 
scribed 3'EBF2 mRNA. The protein and RNA levels in the input and those retained in the anti-GFP column are shown. 

(C) Representative images of ein5-1 mesophyll protoplasts transfected with the indicated constructs. TZF1 and DCP2 were used as P-body markers. tdMCP is 
the tandem version of MCP. MBS corresponds to 24 copies of the MCP binding sequence. The white scale bar represents 25 ^im. 

(D) Predicted ethylene-responsive translation c/s-regulatory element. MEME motif finder identified a consensus sequence present in the 3' UTRs of EBF1 and 
EBF2 mRNAs and significantly enriched in the 3'UTRs of genes regulated at the translational level by ethylene. The sequence of the ethylene responsive motif 
(inner panel) and the secondary structure of the 3'UTR of EBF1 (with the motif-matching sequences highlighted in yellow) are shown. 
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Figure 7. Ethylene-Triggered Root Growth Inhibition and Translational Regulation of EBFs Are Disrupted in the upf2 Mutant 

(A) Representative images of 3-day-old etiolated seedlings of the indicated genotypes grown in the presence of 10 laM ACC. 

(B) Normalized distribution of RNA-seq and Ribo-seq reads in air and in ethylene along the EBF2 (left) and EBF1 (right) genes in upf2-10. 6’ UTR, CDS, 3' UTR, and 
intron are marked as white, black, and gray boxes and a line, respectively. The fold change and the associated FDR for the ethylene effect on transcript and 
footprint levels, as well as the fold change in the footprint levels given the levels of mRNA (TE) and the corresponding FDR, are shown. 

(legend continued on next page) 
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our findings on the role of ethylene in the regulation of translation 
of EBF2 mRNA through its atypically long 3' UTR, the lack of 
ethylene effects on the EBF mRNA stability (Potuschak et al., 
2006), and the known role of the UPFs in inhibiting translation 
and targeting to the P-bodies of mRNAs with long 3' UTRs, we 
decided to investigate the function of the UPFs in the transla- 
tional regulation oiEBFs. Ribo-seq experiments in the hypomor- 
phic allele of UPF2, upf2-10, clearly show that the translational 
regulation by ethylene of EBF2, EBF1 (Figure 7B and Table 
S2), and several other ethylene-regulated mRNAs (Figure S2) 
was dramatically attenuated. Similarly, the analysis of GFP fluo- 
rescence from the 35S:GFP-3' EBF2 construct (Figures 7C and 
7D) and the quantification of TE of the endogenous EBF mRNA 
in response to ethylene (Figures 7E and 7F) also show that the 
upf2-10 mutation attenuates the ethylene-induced translational 
regulation of the EBF mRNA. These findings, together with our 
results that translational regulation mediated by EIN2 does not 
require functional EIN3/EIL1, suggest that the function of UPFs 
needed for proper ethylene response is also required upstream 
of EIN3/EIL1. Importantly, we observed that, as in the case of 
EIN2C, the subcellular localization of UPF1 is also altered in 
the ein5 background (Figure S4F), changing from a uniform nu- 
clear/cytosolic to markedly punctuated foci. This subcellular 
localization of UPF1 partially overlaps with that of EIN2C (Fig- 
ure S4G), suggesting P-body localization. Finally, we analyzed 
the kinetics of the ethylene response and recovery of both upf2 
and the mild ethylene insensitive transgenic plants expressing 
35S:GFP-3'EBF2. This analysis shows that both the upf2 mutant 
and the 35S:GFP-3'EBF2 transgenic lines show similar defects 
during the recovery process after ethylene exposure (Figures 
S3D and S6C). 

DISCUSSION 

The response of plants to the hormone ethylene has been exten- 
sively studied, and a linear signaling pathway responsible for 
triggering the multitude of responses to this hormone has been 
identified. Importantly, all known gene-expression changes trig- 
gered by this hormone require the entirety of the pathway, 
including EIN2 and the master transcriptional regulators EIN3 
and EIL1 (Chang et al., 2013; Olmedo et al., 2006). Other facets 
of the ethylene response, however, have been shown to require 
the activity of EIN2, but not of EIN3 or EIL1 (Binder et al., 2004). 
Thus, non-transcriptional responses to ethylene have been 
postulated to exist and originate from a signaling pathway 
diverging at the level or downstream of EIN2 and, therefore, 
not including EIN3/EIL1. Although the existence of this parallel 
pathway was proposed more than 10 years ago (Binder et al., 
2004), the mechanistic understanding of such signaling process 



has remained obscure. Our finding that ethylene alters the TE of 
specific genes provided missing evidence to start to uncover the 
molecular nature of the postulated parallel pathway. The detailed 
characterization of the ethylene-mediated translational regula- 
tion of EBF2 has revealed that this non-transcriptional ethylene 
effect was indeed EIN3/EIL1 independent and EIN2 dependent. 
Hence, the ethylene-triggered changes in translation fulfilled all 
the pre-requisites of a long-anticipated branch of the ethylene- 
signaling pathway diverging at the EIN2 level. Furthermore, we 
were able to show that this translation-based signaling branch 
plays a significant physiological role in the ethylene response. 
For example, removal of 3^EBF2 resulted in the loss of transla- 
tional responsiveness of this gene to ethylene, and conse- 
quently, dramatic alterations of the plant response to this 
hormone. In particular, our results indicate that the translational 
regulation of EBF2 by ethylene plays a role in the still poorly 
understood process of plant recovery upon withdrawal of the 
hormone. 

Our results also implicate the key signaling component EIN2 in 
the translational regulation of gene expression in response to 
ethylene. Previous studies have shown that EIN2C moves to 
the nucleus in the presence of ethylene and that this transloca- 
tion is required for the activation of the E/A/3/E/L 7 -dependent 
transcriptional changes (Qiao et al., 2012). Here, we have shown 
that EIN2 must also function in a cytosolic process of transla- 
tional control. Although EIN2 has been implicated in the plant 
response to a variety of stimuli (Gazzarrini and McCourt, 2003), 
conclusive evidence for an EIN2 role beyond ethylene signaling 
is still missing. The finding that EIN2C regulates translation 
opens new opportunities to investigate the full functional spec- 
trum of this enigmatic protein. An additional mechanism can 
now be envisioned by which other signals impinge on ethylene 
signaling — i.e., by altering the translational regulatory activity of 
EIN2. We also found that the EIN2C localizes to cytoplasmic 
P-bodies under certain circumstances, such as in ein5 mutants 
lacking the 5'-3' XRN4 exoribonuclease activity (Potuschak 
et al., 2006; Souret et al., 2004). The observation that EIN2C is 
retained in the cytosol of ein5 protoplasts suggests a possible 
mechanistic explanation for the ethylene insensitivity of this clas- 
sical ethylene signaling mutant. 

In addition to uncovering the EIN2C accumulation in P-bodies, 
we also showed that EIN2C has the capability to interact, directly 
or indirectly, with the 3'EBF2 mRNA, as also suggested by the 
results from the accompanying paper by Li et al. (2015) in this 
issue of Cell. In either case, these results, together with the 
finding that the EIN2 function is required for the translational 
regulation of EBF2, raised the question of how EIN2 influences 
translation activity of its RNA targets. It is possible that the 3' 
UTR-bound EIN2C directly or indirectly modulates the activity 



(C and D) (C) Representative image and (D) GFP fluorescence of multiple seedlings (n = 7) expressed as the percentage of fluorescence in ACC compared to that 
in the MS controls. Error bars represent means ± SD. (a) indicates a significant effect of ethylene on the levels of fluorescence (t test, p < 0.05). 

(E and F) TE of EBF2 (E) and EBF1 (F) mRNA, calculated as the relative expression in polysomal/total RNA fractions, in 3-day-old etiolated Col-0 and upf2-10 
seedlings grown in air (Air) or treated with 10 ppm of ethylene for the last 4 hr of the experiment (Ethylene). 

The asterisk (*) indicates a significant difference of the ethylene effect on the EBF TE between Col and upf2-10 (two-way ANOVA, p < 0.05). (a) indicates a 
significant difference of the ethylene effect on the EBF TE in the indicated genotypes (t test, p < 0.05). 

The Col measurements in (E) and (F) are the same as in Figure 1 D, plotted here again to facilitate the comparison between Col and upf2-10. Expression levels of 
the EBF transgenes were normalized against Af4g342 70. 
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of a component of the general translational machinery, thus 
selectively inhibiting the translation of its targets. In fact, some 
of the best-documented examples of gene-specific translation 
regulation involve the direct interaction of an RNA-binding pro- 
tein with particular 3' UTR sequences and a subsequent recruit- 
ment of general translation regulators (Szostak and Gebauer, 
2013). For example, the Drosophila Bicoid protein directly binds 
to the 3' UTR of the embryo-patterning mRNA caudal. This inter- 
action, however, is not sufficient to repress the translation of 
caudal, and Bicoid has to recruit the CAP-binding protein 
4EHP that (due to its low affinity for the translation initiation factor 
elF4G) attenuates the rates of translation of the Bicoid targets by 
failing to recruit the elF3-containing 43S translation initiation 
complex (Cho et al., 2005). It is interesting to note here that 
UPF1 has also been shown to repress translation initiation by 
directly interacting with elF3 and, thus, to prevent the formation 
of the 43S translation initiation complex (Isken et al., 2008). We 
have provided experimental evidences linking UPF function not 
only with the ethylene response, but also, more specifically, 
with the translational repression of EBF2 by this hormone. 
Furthermore, we show that, under certain experimental condi- 
tions, such as in plants lacking functional E/A/5, the3'EEE2-bind- 
ing protein EIN2C and UPF1 co-localize in P-bodies. Based on 
this, we propose a mechanistic model (Figure S7) in which the 
binding of EIN2C to 3'EBF2 triggers the recruitment of the 
UPFs to this mRNA, which in turn results in the inhibition of trans- 
lation initiation by interfering with the formation of the 43S com- 
plex. Although our initial attempts to show a direct interaction 
between EIN2C and the UPFs by means of the yeast two hybrid 
have failed, it is still possible that EIN2C directly or indirectly re- 
cruits the UPFs, perhaps, as it has been suggested by Li et al. 
(2015) in the accompanying paper, via a yet-uncharacterized 
RNA-binding protein that recruits EIN2 to its target mRNAs. It 
is also important to point out that the relatively weak ethylene de- 
fects observed in the upf mutants are likely the result of the hypo- 
morphic nature of the alleles identified in our screen, as well as 
the fact that the function of UPFs is required for the translational 
effect of EIN2C but not necessarily for its activation of the EIN3/ 
EIL1 activity. We have focused here on the regulation of EBF2, 
but it would be interesting to study other translationally regulated 
genes identified herein and explore their role in ethylene-related 
processes, including transcription-independent fast growth inhi- 
bition response (Binder et al., 2004). Finally, additional studies on 
the temporal dynamics of the transition of the translationally 
regulated mRNAs from polysomes to P-bodies in ethylene and 
back to polysomes upon ethylene withdrawal will be necessary 
to extend the mostly static single-time-point studies described 
herein. 

EXPERIMENTAL PROCEDURES 

Plant Growth and Ribosome Footprinting 

Plant growth conditions and hormonai treatments of Arabidopsis seediings 
were as described (Stepanova et al., 2005). Ribosome footprinting (Ingolia 
et al., 2009) was carried out using pelleted polysomes (Mustroph et al., 
2009) with the following modifications. Polysomes were isolated in Extraction 
Buffer (100 mM Tris-HCI [pH 9], 10 mM Tris-HCI [pH 7.4], 100 mM sucrose, 
100 mM KCI, 75 mM NaCI, 20 mM MgCl 2 , 12.5 mM EGTA [pH8], 3 mM DTT, 
6.25 i^l/ml detergent mix [20% (w/v or v/v) of each of the four detergents in 



water: Brij-35, Triton X-100, Igepal CA 630 and Tween 20], 25 |al/ml Triton 
X-100, 37.5 |ig/ml cycloheximide, 25 |ig/ml chloramphenicol), and the diges- 
tion with the RNase I was carried out in a volume of 4.5 ml. After digestion, 
monosomes were re-pelleted and purified by sucrose gradient fractionation. 
RNA fragments corresponding to the ribosome footprints were recovered 
from the purified monosomes and sequenced as described (Ingolia et al., 
2009). Data processing was performed using a combination of custom- 
made Perl scripts, as well as R and Bioconductor programs. 

Immunoblot and qRT-PCR 

Protein samples were prepared by homogenizing the liquid nitrogen-ground 
tissues in 2x SDS-PAGE sample buffer (Laemmli, 1970) and boiling the ho- 
mogenate for 5 min. Proteins were separated through a 12% SDS-PAGE 
gel, transferred to a nitrocellulose membrane, and hybridized to anti-GFP 
antibodies (Living Colors A.v. Monoclonal Antibody, Clontech). 

Total RNA was extracted as previously described (Reuber and Ausubel, 
1996). Polysomal RNA was isolated by pelleting polysomes (Mustroph et al., 
2009) and then extracting the RNA by the SDS/acid phenol method (Ingolia 
et al., 2009). Reverse transcription and qPCR (Applied Biosystems) were per- 
formed according to manufacturer’s recommendations. Primer sequences are 
listed in the Supplemental Experimental Procedures. 

Yeast Three-Hybrid and RNA Immunoprecipitation 

The yeast three-hybrid system (Bernstein et al., 2002) was used to test the 
interaction between the EIN2C fragment (amino acids 459 to 1278) and 
3'EBF2 RNA. Interaction was inferred based on the activity of LacZ and 
HIS3 reporters as described (Deplancke et al., 2006). 

RNA immunoprecipitation assay was performed as described (Nicaise et al., 
2013). Protein extracts from Nicotiana benthamiana leaves expressing 
35S:GFP-EIN2C-pGWB6 or a negative control 35S:GFP-PDC2-pGWB6 (Ste- 
panova et al., 2011) were incubated with anti-GFP-TRAP-A beads (Chromo- 
tek) and 50 |ag oi3EBF2 RNA synthesized in vitro using RiboMAX Large Scale 
RNA Production System-T7 (Promega). After extensive washes, RNA-protein 
complexes were eluted from the beads by incubating at 60°C for 15 min in 
200 III of Elution Buffer (1% SDS, 0.1 M NaHC03) and treated for 1 hr at 
60°C with 40 lag Proteinase K, followed by SDS/Phenol RNA extraction, 
reverse transcription (Applied Biosystems), and 30 cycles of qPCR (Power 
SYBR green Master Mix, Applied Biosystems). 

Protoplast and Tobacco Transient Expression Assays 

Protoplasts were isolated using the tape-Arabidopsis sandwich method (Wu 
et al., 2009) and transfected according to a published protocol (Yoo et al., 
2007). Transient expression in Nicotiana benthamiana leaves was performed 
as described elsewhere (Wang et al., 2015). 

Imaging was done using a Leica DFC365 FX camera attached to a compound 
microscope DM5000 with the following filters: GFP filter cube (EX 470/40 EM 
525/50), CFP filter cube (Ex 436/20 Em 480/40), and TX2 filter cube (Ex 560/ 
40 Em BP645/75). The Objective HCX PLAPO 40X/0.10 was used. 

A more detailed description of the materials and methods is provided in the 
Supplemental Experimental Procedures. 
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The accession number for the sequencing data reported in this paper is NCBI 
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SUMMARY 

Most human transcripts are alternatively spliced, 
and many disease-causing mutations affect RNA 
splicing. Toward better modeling the sequence de- 
terminants of alternative splicing, we measured the 
splicing patterns of over two million (M) synthetic 
mini-genes, which include degenerate subsequences 
totaling over 100 M bases of variation. The massive 
size of these training data allowed us to improve 
upon current models of splicing, as well as to gain 
new mechanistic insights. Our results show that the 
vast majority of hexamer sequence motifs measur- 
ably influence splice site selection when positioned 
within alternative exons, with multiple motifs acting 
additively rather than cooperatively. Intriguingly, 
motifs that enhance (suppress) exon inclusion in 
alternative 5' splicing also enhance (suppress) exon 
inclusion in alternative 3' or cassette exon splicing, 
suggesting a universal mechanism for alternative 
exon recognition. Finally, our empirically trained 
models are highly predictive of the effects of naturally 
occurring variants on alternative splicing in vivo. 

INTRODUCTION 

Alternative splicing is a major source of proteome diversity in eu- 
karyotes (Nilsen and Graveley, 2010). Regulation of alternative 
splicing is vital to cellular processes that depend on the precise 
ratios of isoforms. For example, mutations that lead to even sub- 
tle changes in the ratio of MART isoforms 3R and 4R cause an 
inherited form of dementia (Garcia-Bianco et al., 2004). While 
new sequencing technologies have enabled the comprehensive 
cataloging of human genetic variation, the functional conse- 
quences of these variants on even molecular phenotypes, such 
as alternative splicing, remain poorly predictable. 

Experimentally testing the consequence of every possible 
genetic variant on endogenous alternative splicing is impractical, 
motivating the development of predictive models of the “splicing 
code.” The core splicing signals— 5' splice donor, 3' splice 
acceptor, branchpoint, and polypyrimidine tract— form the basis 
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of the splicing code; they are required for recognition of intron- 
exon boundaries and for correct intron removal by the splicing 
machinery. Computational methods have been developed to 
score the likelihood of splicing at different splice donor and 
acceptor sequences (Yeo and Burge, 2004). Splice regulatory 
elements (SREs)— sequence motifs in exons or introns shown 
to regulate splicing— form the next level of regulatory informa- 
tion. SREs typically regulate alternative splicing by binding 
frans-acting splice factor proteins (Ule et al., 2006; Wang et al., 
2013). Depending on their position and mode of action, SREs 
are classified as exonic splice enhancers (ESEs), exonic splice 
silencers (ESSs), intronic splice enhancers (ISEs), or intronic 
splice silencers (ISSs). Examples of SREs have been identified 
computationally by analyzing motif enrichment near splice sites 
(Castle et al., 2008; Fairbrother et al., 2002; Zhang and Chasin, 
2004) or sequence conservation between species (Goren et al., 
2006). Recently, a deep neural network was trained on exon 
skipping events in the genome to generate a comprehensive 
model of the splicing code that can be used to predict exon in- 
clusion percentages (Xiong et al., 2014). Despite this progress, 
current models of alternative splicing do not perform well enough 
to be used in clinical genetics (e.g., to reclassify “variants of un- 
certain significance”), and many machine learning strategies 
result in “black boxes” that limit mechanistic insight. 

We hypothesized that a model of alternative splicing learned 
from very large libraries of synthetic sequences could outper- 
form models trained only on the genome. Current technology 
makes it possible to create and test gene libraries with millions 
of synthetic sequences— orders of magnitude more than the 
number of alternative splice events in the human genome. In 
other applications of machine learning, such as computer vision, 
predictive power has increased greatly with access to larger 
datasets (Le et al., 2012). Previous work supports the idea that 
synthetic gene libraries with extensive and targeted variation 
can provide mechanistic insight into biological phenomena. 
In vivo (Culler et al., 2010; Wang et al., 2012) and in vitro (Yu 
et al., 2008) randomized selections have identified potential 
SREs. Massively parallel reporter assays (MPRAs) that combine 
next-generation sequencing with extensive variation have been 
applied to study transcription (Melnikov et al., 201 2; Patwardhan 
et al., 2012; Patwardhan et al., 2009; Sharon et al., 2012; Smith 
et al., 201 3; White et al., 201 3), translation (Noderer et al., 201 4), 
mRNA stability (Oikonomou et al., 2014), and even alternative 
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Figure 1. A Predictive Model of Alternative 
Splicing Learned from Millions of Synthetic 
Sequences 

(A) Two libraries with either alternative 5' or 
3' splice sites were constructed with two 25-nt 
randomized regions. The library was transfected 
into human cells, and massively parallel mea- 
surement of isoform ratios was performed with 
RNA-seq. These two datasets were used to learn a 
predictive model of alternative splicing. The model 
takes a sequence as input, which is then con- 
verted to 6-mer features. A score for each 6-mer is 
learned and then used to predict the fractional 
usage of each splice site. 

(B) When human sequence variants are fed to 
the model as inputs, the model makes more ac- 
curate predictions than the current state of the art 
algorithms. 






splicing (Ke et al., 2011). However, MPRA studies to date have 
overwhelmingly focused on measuring the consequences of var- 
iants in endogenous sequences (e.g., saturation mutagenesis) or 
on validating predicted activities (e.g., enhancers predicted by 
the ENCODE project). There are thus far few, if any, examples 
of predictive biological models learned entirely on MPRA data. 

To test whether it is possible to learn predictive biological 
models from synthetic data alone, we developed an MPRA 
that measures alternative splice site selection in a highly com- 
plex library of “degenerate introns” (Figure 1A). We added 
degenerate regions into an otherwise fixed sequence context, 
ensuring that any differences in gene expression can be causally 
attributed to the degenerate region. We created two libraries, 
one with alternative 5' splice donors consisting of 265,1 37 mem- 
bers and one with alternative 3' splice acceptors containing 
2,21 1 ,739 members. We transfected these libraries to human 
cells, performed RT-PCR and RNA sequencing (RNA-seq) 
to quantitatively measure isoform ratio for all mini-genes and 
used the results to learn a predictive model of alternative 
splicing. To assess the quality of the resulting model, we pre- 
dicted the effects of human sequence variants on isoform levels 
and compared our results to available experimental data (Fig- 
ure IB). We tested variants in alternative 5' splicing events, 
both within the alternative splice donors themselves and within 
the alternative exon. Although our MPRA did not include a skip- 
ped exon library, our model also predicted with high accuracy 
the effect of sequence variants in skipped exons. 

RESULTS 

Molecular Phenotyping of Millions of Alternatively 
Spliced Mini-Genes Containing Random Sequences 

We chose to study both alternative 5' and alternative 3' splice 
site selection. In the case of alternative 5' splicing, we first gener- 



ated a complex library by introducing 2 x 
Predictions 25 nt fully degenerate regions into a sin- 

gle-intron plasmid mini-gene (Figure 2A). 
Specifically, the intron was designed 
with two competing splice donors sepa- 
rated by 44 nt; one degenerate region was inserted between 
the splice donors and the other downstream of the second 
donor. Neither degenerate sequence overlapped a splice donor. 
The mini-genes contained an additional degenerate 20 nt 
barcode in the 3' UTR. This barcode was used to create a 
look-up table linking barcodes and intronic sequences. Thus, 
even when both degenerate regions were spliced out, their se- 
quences could be recovered from the barcode sequence (Fig- 
ure 2A). To maximize intron sequence variability, we constructed 
and sequenced a complex library of 265,137 such mini-genes. 
Thus, over 13 Mb of unique intronic sequence are represented 
within the degenerate regions of this library (265,137 x 50 nt). 

In the case of alternative 3' splicing, we inserted 2 x 25 nt 
fully degenerate regions into a single-intron system designed 
to have two alternative 3' splice sites (Figure 2C). The degen- 
erate regions did not overlap either splice acceptor, but the 
upstream degenerate region did overlap the typical position 
of the first splice acceptor’s branchpoint (-44:-19 relative 
to SA-i). Similarly to the alternative 5' library, we included 
an additional degenerate 20-nt barcode in the 3' UTR. The 
alternative 3' library contained 2.2 million unique mini-genes 
encompassing over 110 Mb of unique sequence variation 
(2,211,739 X 50 nt). 

We transfected the pooled libraries of plasmids into HEK293 
cells and then quantified isoform ratios with targeted RNA-seq. 
To identify both the isoform and originating plasmid of each 
mRNA, we used paired-end sequencing with one read across 
the exon junction and the other read across the 3' UTR barcode 
(Figures 2A and 2C). We used 1 3 million reads for the alternative 
5' library and 5.4 million reads for the alternative 3' library. We 
were then able to calculate the isoform ratios for each mini- 
gene in each library. We averaged 50.0 reads per mini-gene in 
the 5' library with reads mapping to 265,044/265,137 (99.96%) 
of all mini-genes. On the other hand, in the 3' library we averaged 
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only 2.47 reads per mini-gene with reads mapping to 1 ,686,096/ 
2,21 1 ,739 (76.23%) of all mini-genes. 

Degenerate Sequences in Both Libraries Strongly 
Influence Isoform Ratios 

In the alternative 5' library, isoforms were present from several 
different splicing events. The most upstream splice donor (SDi) 
was used on average 22.4% of the time, while SD 2 was used 
50.0% of the time (Figure 2B). The remaining transcripts were 
spliced at new splice donors inserted into the randomized 
regions (11.3%), a cryptic splice donor site (SDcrypt) 35 nt 
downstream of SD 2 (7.9%), or not spliced at all (8.4%). However, 
as evidenced by the broad distributions of usage at each SD 
(Figure 2B), the degenerate regions had a strong influence on 
splice site selection. For instance, although 49.7% of mini-genes 
spliced at SD-i with less than 5% frequency, 7,705 mini-genes 
(2.9%) spliced at SD-i with over 95% frequency. 

In the alternative 3' library, we also found isoforms from 
different splicing events, although splice site usage was less 
evenly balanced than in the 5' library. SA-i was used an average 
of 3.3% of the time, while SA 2 was used 89.2% of the time 
(Figure 2D). In this library, new splice sites in the randomized re- 
gions were only used with 0.3% frequency, probably reflecting 
the larger information footprint of splice acceptors (>20 nt) 
compared to splice donors (9 nt), which makes the occurrence 
of new sites within the degenerate regions less likely. Similarly 
to the 5' library, we inadvertently inserted a cryptic splice 
acceptor 16 nt upstream of SA 2 that was used with 4.6% fre- 
quency. Many other cryptic splice sites were used with very 
low frequency (1 x 10“^ to 5 x 10“^) accounting for a total of 
2.3% of transcripts. In contrast with the alternative 5' library, 
only 0.3% of transcripts were unspliced. Although SA 2 was the 
dominant splice site, 0.7% of the 1.2 M of mini-genes repre- 
sented by multiple reads spliced 100% at SA-|. 

With so many transcripts in each library splicing at new splice 
sites, we asked whether we could rediscover the known motifs 
for splice donors and splice acceptors from the de novo sites 
alone. When we plotted the relative frequencies of each base 
at each position for new splice donors (Figure 2E) and new 
splice acceptors (Figure 2F), both splice site motifs were nearly 
identical to the expected motifs for splice donors and splice 
acceptors. More specifically, the splice donors contained the ca- 
nonical GT at the +1:+2 positions, while the splice acceptors 
contain a clear polypyrimidine tract (T and C rich), followed by 
N[CT]AGG. The ability to fully rediscover canonical signals for 
splice donors and splice acceptors demonstrates the rich type 
of information contained in each dataset. 

We also asked whether translation might affect the mRNA sta- 
bility in our libraries. Sequencing of the alternative 5' library 
yielded fewer median reads on mRNA from mini-genes that 
were primarily spliced out of frame than in frame (Figure SI A). 
However, when the mini-genes contained a premature stop 
codon, the median number of reads per mRNA was similar for 
all three reading frames (Figure SIB). These results indicate 
that a large string of amino acids translated out of frame will 
destabilize the mRNA, likely through the no-go decay pathway 
(Doma and Parker, 2006; Shoemaker et al., 2010) as ribo- 
somes stall due to protein misfolding. We also find evidence of 



nonsense-mediated decay, but only if the premature stop codon 
occurred >40 nt upstream of the splice donor. This is consistent 
with previous studies on nonsense-mediated decay that suggest 
the premature stop codon must occur >50 nt upstream of the last 
exon junction (Lewis et al., 2003). 

Splicing Is More Likely to Occur at Upstream Splice 
Donors 

From an analysis of the new splice sites, we found strong evi- 
dence that upstream splice donors were favored over down- 
stream splice donors; new splice donors inserted in the first 
degenerate region were 4.1 times more likely to be used than 
new splice donors inserted into the second degenerate region 
(region 1: 849,666 spliced reads; region 2: 208,396 spliced 
reads). Furthermore, the effect of position of splice donors within 
each degenerate region was significant (p < 0.005; Figure SIC). 
The number of spliced reads at a new splice site decayed expo- 
nentially with the distance from SDi (Figure 2G). Splicing has 
been shown to be co-transcriptional, and spliceosome compo- 
nents can begin to assemble at a 5' splice donor before down- 
stream alternative slice sites are transcribed (Listerman et al., 
2006), suggesting a potential mechanistic explanation for the 
observed effect. This strong bias for upstream splice donors is 
consistent with the typically short length of exons in the human 
genome (Burge and Karlin, 1997). 

Splicing Is Less Likely to Occur at Splice Acceptors with 
Distal Branchpoints 

Large-scale mapping of human branchpoints with RNA-seq 
found that 90% of mapped branchpoints occur between 19- 
37 nt upstream of the splice acceptor (Mercer et al., 2015). How- 
ever, it remains unclear just how detrimental a distal branchpoint 
is toward efficient splicing. Consensus branchpoints (CU[AG]A 
[CD]) occur over 10,000 times at every position between 40 to 
19 nt upstream of SAi in our dataset, allowing us to answer this 
question. We found that mini-genes with a consensus branch- 
point sequence 19 nt upstream of SAi were approximately six 
times more likely to be spliced at SAi relative to those with a 
branchpoint 40 nt upstream of SA-i (Figure 2H). One explanation 
for this phenomenon could be that distal branchpoints are more 
likely to contain another AG between the branchpoint and SA-i 
that could be used as an alternative splice acceptor. However, 
we observed a strong distance dependence on branchpoint po- 
sition for sequences both with and without an AG between the 
branchpoint and SAi (Figure SID). This result suggests that 
mechanism by which distal branchpoints reduce splicing effi- 
ciency is primarily due to the increased distance between the 
branchpoint and the splice acceptor and/or polypyrimidine tract. 

Sequence Motifs in Alternative Exons Have a Stronger 
Regulatory Role than Intronic Sequences 

Next, we asked how short sequence motifs affect splice site 
selection in different contexts. We chose to analyze the effects 
of 6-mer because each possible 6-mer occurs within an average 
of 1 ,294 mini-genes for the alternative 5' library, and 8,232 mini- 
genes for the alternative 3' library. Furthermore, most known 
RNA binding proteins (RBPs) are reported to bind sequences be- 
tween 4-8 nt (Lunde et al., 2007). In order to estimate the effect of 
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Figure 2. Splice Site Selection in Two Million Alternative 5^ and 3' Spliced Sequences 

(A) A schematic of the alternative 5' library. Spliced reads map to SDi , SD 2 , and a cryptic splice site (SDcrypt). as well as new splice donors (SDnew) created in the 
degenerate regions. 

(B) Distributions of splice site usage across library mini-genes. Distributions are shown for SDi, SD 2 , SDcrypt. and SDnew- Insets correspond to the framed 
regions in the main graph. Mean splice site usage is indicated with a blue vertical line. 

(C) A schematic of the alternative 3' library. Spliced reads map to SAi , SA 2 , and a cryptic splice site (SAcrypt). as well as new splice donors (SDnew) created in the 
degenerate regions. 

(D) Distributions of splice site usage across library mini-genes. Distributions are shown for SDi, SD 2 , SDcrypt, and SDnew- Insets correspond to the framed 
regions in the main graph. Mean splice site usage is indicated with a blue vertical line. 

(E) The splice donor motif recovered from the new splice alternative 5' library matches the previously known human splice donor site. 

(F) The splice acceptor motif recovered from the new splice alternative 5' library matches the previously known human splice acceptor site. 

(legend continued on next page) 
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each possible 6-mer in each region, we calculated splice site us- 
age for the subset of mini-genes containing the 6-mer and for the 
much larger subset not containing the motif. We then asked to 
what extent the odds of splicing at a splice site changed in the 
presence of the motif relative to the control set. To quantify 
this “effect size,” we used the log 2 odds ratio with and without 
the 6-mer present (Supplemental Experimental Procedures). 
For example, we found that mini-genes containing the 6-mer 
GTGGGG in the first degenerate region of the 5' library were 
spliced at SD 2 only 19.0% of the time, while RNA derived from 
mini-genes not containing this motif spliced at SD 2 50.2% of 
the time, resulting in an effect size of -2.1 (Figures S2A-S2D). 
In other words, the odds of splicing at SD 2 are 4.29 (2^-^) times 
lower in the presence of GTGGGG compared to its absence. 

In Figure 3A, we plot the empirically measured effect sizes of all 
hexamers in the first degenerate region on the relative usage of 
SD 2 and SDi, with 95% confidence intervals. The strongest en- 
hancers located in the alternative exon (included when splicing 
occurs at SD 2 , but excluded when splicing occurs at SDi) 
increased the odds of splicing at SD 2 4.38-fold, while the stron- 
gest silencers decreased the odds 16-fold. Approximately 15% 
of 6-mer have been previously identified as SREs (Culler et al., 
2010; Fairbrother et al., 2002; Wang et al., 2004, 2012) (622/ 
4,096), but here 82.9% of 6-mer (3,396/4,096) exhibited a signif- 
icant effect on isoform selection (95% confidence interval does 
not contain zero effect size). Intriguingly, the cumulative effects 
of previously identified SREs accounted for only 20% of the cu- 
mulative effects of all possible 6-mer. The strongest silencers 
were G rich, consistent with known binding sites for hnRNPs 
(Marti nez-Contreras et al., 2006). On the other hand, some of 
the strongest enhancers for SD 2 appear to act by generating sec- 
ondary structure around SDi : the 6-mer perfectly complementary 
to part of SDi (-3 to +8) were all in the top 6% of SD 2 enhancers 
(percentiles: 97.77, 99.75, 99.97, 94.23, 94.79, and 98.92). 

We then looked at the effects of 6-mers in the second degen- 
erate region (3' to SD 2 ). Unlike the first degenerate region, which 
is located within the alternative exon region, the second degen- 
erate region is intronic to both SD-i and SD 2 . We found that the 
effect sizes were much smaller than in the first degenerate region 
(Figure 3B). The strongest enhancer and silencer of SD 2 , respec- 
tively, only changed the odds of splicing at SD 2 relative to SD-i 
1.95-fold and 1.48-fold. Furthermore, only 36.7% of 6-mer 
(1 ,505/4,096) had a statistically significant effect. 

We performed a similar analysis for each degenerate region on 
the usage of SAi in the alternative 3' library (Figures 3C and 3D). 
Again, we found that motifs in the alternative exon (3' of SA-i, but 
5' of SA 2 ) had strong effect sizes (statistically significant 6-mer 
effect sizes: 3,500/4,096, 85.4%; strongest enhancer: 3.84-fold 
increase in odds of splicing at SD 2 ; strongest silencer: 9.87- 
fold decrease in odds of splicing at SD 2 ). Unlike in the alternative 
5' library, we found that motifs in the intronic degenerate region 



(5' of SA-i and SA 2 ) also had quite strong effects (statistically 
significant 6-mer effect sizes: 3,248/4,096, 79.3%; strongest 
enhancer: 3.45-fold increase in odds-ratio; strongest silencer: 
4.63-fold decrease in odds-ratio), although still generally smaller 
in magnitude than the downstream alternative exon region. 
When we looked at the strongest 6-mer enhancers of SAi in 
this intronic region, we found they all fit the consensus branch- 
point sequence CU[AG]A[CU] (Figure 3D). 

The Same Sequence Motifs Regulate Alternative Exon 
Inclusion Independent of the Type of Alternative 
Splicing 

Surprisingly, we found that the effect sizes of 6-mers occurring 
within the alternative exon regions were extremely similar be- 
tween the alternative 5' and 3' libraries (Figure 3E; R^ = 0.68). 
We looked at several motifs known to bind splice factors or 
that have previously been identified as ESEs/ESSs (G-run, 
SRSF1 , hnRNPAI , hnRNPH2) and found the effect sizes to be 
highly correlated. In both libraries, GGGGGG was the strongest 
exonic silencer (5' library: 16.0-fold change in odds ratio; 3' li- 
brary: 9.87-fold reduction in odds ratio). 

We also compared the effect sizes of intronic 6-mers (second 
randomized region in the alt. 5 library; first randomized region in 
the alt. 3' library) between the two libraries. We found a signifi- 
cant, but weaker, correlation between the 6-mer scores (R^ = 
0.27; Figure S2E). The first randomized region in the alternative 
3' library overlaps the expected location of the SAi branchpoint, 
which may reduce the effect size correlation. However, the 
weaker correlation can also be explained by the fact that the 
effect sizes of intronic 6-mer were much smaller in magnitude 
compared to 6-mer within the alternative exon regions. 

Sequence Motifs Regulate Exon Inclusion Additively 
Rather than Cooperatively 

Although previous studies have observed co-occurrence of 
conserved sequence motifs around splice sites (Barash et al., 
2010), it remains unclear whether such motifs act cooperatively 
or additively and independently of one another to regulate alter- 
native splicing. In an additive and independent model of regula- 
tion, the joint effect size of multiple motifs should simply equal 
the sum of the individual effect sizes (Figure 4A). To assess 
this, we examined the joint effect sizes of pairs of 4-mers on 
alternative exon-inclusion levels in both the 5' and 3' libraries. 
We chose 4-mers because pairs of 4-mers occur sufficiently 
often within each randomized region to allow for robust effect 
size measurements (alt. 5' library: 692 mini-genes/4-mer pair; 
alt. 5' library: 4,399 mini-genes/4-mer pair). 

We first calculated the individual effect size of all 4-mers on 
exon inclusion in the 5' library. We then calculated the joint effect 
size of every possible pair of non-overlapping 4-mers. Surpris- 
ingly, we found that combinatorial effects were extremely well 



(G) The number of spliced reads at each position within the randomized regions shows a strong position dependency. Splicing is more likely to occur at an 
upstream (5') splice donor than at a downstream (S') splice donor. The gray line is a fit that shows the linear relationship between the location of splice donor and 
the log read count at that location. 

(H) Mini-genes with a consensus branchpoint near SAi are much more likely to use SA-i than mini-genes with a distal branchpoint. The red line indicates the SA-i 
usage, when there is no consensus branchpoint. 

See also Figure S1 . 
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Figure 3. Measured Effect Sizes of Individual 6-mer in Each Degenerate Region 

(A-D) To measure how sequence motifs alter the relative use of SD 2 /SD -1 or SA- 1 /SA 2 , we calculated effect sizes for every 6-mer (n = 4,096) within each degenerate 
region in both libraries. We defined effect sizes as the log odds ratio of SD 2 or SAi usage between mini-genes with/without the 6-mer of interest. The 6-mer are 
ranked by estimated effect size and plotted with 95% confidence intervals generated by bootstrapping with replacement. (A) Alternative exon region in 5' library. 
(B) Intronic region in 5' library. (C) Alternative exon region in 3' library. (D) Intronic region in 3' library. 

(E) The 6-mer scores in the alternative exon region in both the 5' and 3' libraries (A and C) are highly similar, suggesting alternative splicing in both libraries is 
regulated by the same mechanism. 

See also Figure S2. 
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Additive Model: 

Effect Size(GTGG & CTGC) = Effect Size(GTGG) + Effect Size(CTGC) 
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Figure 4. Combinatorial Regulation of 
Alternative Splicing Is Additive 

(A) An additive modei of aiternative spiicing regu- 
iation: the joint effect size of two 4-mer is equai to 
the sum of the individuai 4-mer effects. 

(B) Using an additive modei, the predicted 
combinatoriai effect size of every pair of 4-mer (n = 
65,536) is piotted on the ieft. Each pixei corre- 
sponds to a pair of 4-mer with the 5' 4-mer on the 
X axis and the 3' 4-mer on the y axis. The 
measured combinatoriai effect sizes from the 5' 
iibrary data are piotted in the middie. The residuais 
between the additive modei and the observed 
data are piotted on the right. The additive modei 
expiains >90% of the combinatoriai effect sizes 
(R^ = 0.913). 

(C) The same anaiysis is repeated for the aiterna- 
tive 3' iibrary. In this library the additive model 
explains over 95% of the combinatorial effect 
sizes (R^ = 0.954). 

See also Figure S3. 



well for prediction. Using both the 5' and 3' 
libraries, we trained a joint model of alter- 
native exon definition in which a score is 
learned for each of the 4,096 possible 6- 
mers (Figure S3A). The scores learned 
here are similar to the previously calcu- 
lated effect sizes, but rather than 
measuring the effects of a single 6-mer 
one at a time, we learned all the scores 
together through regression. Given the 
large number of new splice donors ap- 
pearing within the 5' library, we also chose 
to train a model of the splice donor site 
itself (Figure S3B). When we tested the 
splice donor model using cross validation, 
we found it accurately predicted the frac- 
tion of reads mapping to the three original 
splice donors, accounting for up to 75% of 
observed isoform variability (R^: SDi = 



captured by the sum of the 4-mer’s individual effect sizes (R^ = 
0.913; Figure 4B). We did the same analysis for 4-mers located 
in the second degenerate region of the 3' library. Here, the linear 
model fit the experimental data even better (R^ = 0.954; Fig- 
ure 4C). Thus, while specific instances of cooperative sequence 
interactions have been well documented (Huelga et al., 2012; 
Oberstrass et al., 2005), our results suggest the majority of motifs 
primarily exert their influence on exon inclusion independently of 
the surrounding motifs. 

Predicting Isoform Ratios in Alternative Splicing from 
Sequence Information 

We then turned to the task of learning a model of alternative 
splicing to predict isoform levels from sequence information. 
Because combinatorial regulation of alternative splicing was accu- 
rately captured by an additive model, we postulated that an addi- 
tive model with short sequences as input features would perform 



0.75, SD 2 = 0.75, SDqrypt ~ 0.54; Fig- 
ure 5A). It also proved accurate in predicting the position and 
fraction of reads mapping to newly created splice donor sites 
within the degenerate regions (R^: 0.83; Figures 5A and 5B). 

A fundamental advantage of testing synthetic sequences is the 
ability to learn from larger datasets than were previously available. 
As an attempt to quantify this advantage, we calculated learning 
curves on a simple model predicting usage of SDi in the alternative 
5' library. We split our data into training and test sets (90%/10% 
split) and trained models using subsets of the training data (be- 
tween 100 to 177,827 training points). We also trained separate 
models using 3-mers, 4-mers, 5-mers, 6-mers, or 7-mers. With 
limited data (1 ,000 or fewer training points), the simplest model 
(3-mers) made the most accurate predictions, while the 7-mer 
model made the least accurate predictions, with the other models 
ordering between (Figure 5C). However, with the largest training 
subset (177,827 points), the results were reversed with the 7-mer 
model achieving the highest accuracy. Based on the slopes of 
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the learning curves, the 3-mer to 5-mer models would not benefit 
significantly from more data points (> 1 77,827), but the 6-mer, and 
especially the 7-mer, models seem likely to achieve significantly 
higher prediction accuracy with larger training sets. These results 
highlight the intuitive point that richer feature sets can improve 
predictions accuracy, but require more data to properly train. 

Predicting the Effects of Human Genomic SNPs on 
Alternative Isoform Ratios 

Next, we asked whether we could apply our model (HAL [hex- 
amer additive linear])— developed entirely in the context of 
synthetic mini-genes— to predict changes in alternative splice 
donor usage caused by common polymorphisms in human 
genomes. As a first test case, we focused on 5' alternative 
splicing. Combining DNA and RNA sequencing data, respec- 
tively, from the 1000 Genomes Project (Abecasis et al., 2012) 
and GEUVADIS consortium (Lappalainen et al., 2013), we calcu- 
lated the percent of splicing at the downstream alternative splice 
donor (percent spliced in [PSI]) of wild-type genotypes for 8,546 
5' alternative splicing events using the MISO software package 
(Katz et al., 201 0). We separately calculated mean isoform levels 
for genotypes heterozygous or homozygous for a single SNP in 
the region between the two competing splice donors or within 
the splice donors themselves (Table SI). 

We began by investigating whether the model of the actual 9- 
nt splice donor sequence— again learned completely from our 
synthetic mini-genes— could accurately predict the effects 
of SNPs occurring within splice donor sequences. We also 
compared our prediction accuracy to a leading splice donor pre- 
diction tool trained directly from splice donor usage in the human 
genome (MaxEnt) (Yeo and Burge, 2004). Among heterozygous 
SNPs in alternative splice donors occurring in multiple individ- 
uals, we found that 93 of 199 SNPs altered PSI by >5% (Figures 
6A and 6B). Within this set, HAL predicted the direction of 
change with 87.1% accuracy (81/93; binomial p = 9.83 x 
10“^^), while MaxEnt predicted the direction of change with 
81.7% accuracy (76/93; binomial p = 4.45 x 10“^°). Among 
the 35 homozygous SNPs in splice donors that alter PSI 
by >5%, our model predicted every SNP correctly, while MaxEnt 
made two mistakes (HAL: 35/35, binomial p = 5.82 x 10“^"^; 
MaxEnt: 33/35, binomial p = 3.67 x 10“^°). For the set of 
SNPs within splice donors, our model explained 59.3% of the 
observed heterozygous effects (R^ = 0.593, p = 6.38 x 10“®) 
and 67.7% of the observed homozygous effects (R^ = 0.677, 
p = 4.65 X 10“^^). This is a substantial improvement over Max- 
Ent, which accounted for 39.8% of the observed heterozygous 
effects (R^ = 0.398, p = 1.22 x 10“^^) and 41 .1 % of the observed 
homozygous effects (R^ = 0.41 1 , p = 3.3 x 1 0“®). Even when we 



extended our analysis to all SNPs (including those with less than 
5% change in PSI), we found HAL substantially outperformed 
MaxEnt (HAL: R^ = 0.48; MaxEnt: R^ = 0.22; Figure S4A). 

We then applied the model to predict the effects of human 
genomic SNPs in the alternative exon region between, but not 
overlapping, splice donors. Because most SNPs not occurring 
in actual splice sites are likely to only have modest effects, we 
restricted our analysis to SNPs with at least ten homozygous 
wild-type or ten heterozygous samples expressing the relevant 
mRNA. Moreover, we focused on SNPs that resulted in a change 
in the PSI of at least 5% to minimize the impact of measurement 
noise on the validation dataset; 43/344 heterozygous and 20/ 
131 homozygous SNPs altered the PSI by >5% (Figure 6C). HAL 
correctly predicted the direction of change for 37/43 heterozygous 
and 1 7/20 homozygous SNPs (p: heterozygous = 1 .63 x 1 0“®, 
homozygous = 2.58 x 10“®, combined = 6.11 x 10“®). Further- 
more, our model explained around half of the total observed 
effects of these SNPs (heterozygous: R® = 0.570, p = 9.23 x 
1 0“®; homozygous: R^ = 0.442, p = 1 .39 x 1 0“®). Thus, our model 
not only outperformed the state of the art splice donor algorithm 
(MaxEnt) at predicting the effects of SNPs within splice donors 
but also successfully predicted the effects of SNPs within the alter- 
native exon region, which to our knowledge, no other tool can do. 

Predicting Alternative 5 ' Isoform Levels from Sequence 
Information 

To further assess the accuracy of our splice donor model, we 
predicted the isoform ratios in 6,1 52 alternative 5' splicing events 
expressed in lymphoblastoid cell lines and compared our results 
to four other splice donor prediction algorithms. Our splice donor 
model substantially outperformed all of the other algorithms (Fig- 
ure S5; Table S2). Interestingly, all of the models (including ours) 
performed better on events with shorter alternative exon regions 
(i.e., the region between splice donors). In these events, there is 
less space for regulation between the splice donors, possibly 
simplifying the prediction task. 

Predicting the Effects of Variants on Exon Skipping in 
Mendelian Diseases 

The most common form of alternative splicing is neither alter- 
native 5' or 3' splicing, but exon skipping. Exon skipping is 
a highly regulated form of alternative splicing in human cells, 
and misregulation of cassette exon splicing can cause disease 
(Garcia-Bianco et al., 2004) and cancer (Kim et al., 2008). Given 
the relatively more complex structure of skipped exons, it might 
on first sight seem unlikely that a model trained only on 5'and 3' 
alternative splicing should be able to predict levels of exon 
inclusion. However, we hypothesized that the similarity between 



Figure 5. A Model Accurately Predicts Alternative 5^ Splicing and the Location of New Splice Donors 

(A) For each splice donor (SDi, SD 2 , SDcrypt), model predictions are plotted against the observed splice site usage fraction. Each point represents a single test 
plasmid. The results are also plotted for all new splice sites (SDnew)- 

(B) The prediction results for three different mini-genes are shown with the associated nucleotide scores for each isoform. Each nucleotide score is calculated by 
averaging the model weights of all 6-mer overlapping the nucleotide. In the first example mini-gene, HAL predicts the usage and position of a new splice donor, 
which is confirmed by RNA-seq. 

(C) A learning curve was generated for different models that predict the fraction of splicing at SD-|. The simplest model (3-mer features) performed the best with 
small training sets (<1 ,000 data points), but with more data points, richer feature sets offer better performance. 

See also Figure S5. 
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Figure 6. Splicing Model Identifies the 
Functional Effect of SNPs on Alternative 
Splicing 

(A-C) Model predictions are plotted with the PSI 
measured from RNA-seq for SNPs occurring in the 
upstream splice donor (A), the downstream splice 
donor (B), and between the competing splice 
donors (C) that alter the measured PSI by greater 
than 5%. The observed PSI from RNA-seq for the 
wild-type genotype (gray bar) and genotypes 
containing the SNP (red) are plotted together with 
the model prediction (blue). The model accurately 
predicts the direction of change of the heterozy- 
gous SNPs in splice donors with 87.1 % accuracy 
(81/93; binomial p = 9.83 x lO"”"^) and the het- 
erozygous SNPs between splice donors with 
86.0% accuracy (37/43; binomial p = 8.18 x 10“^). 
See also Figure S4 and Tables SI and S2. 




the sequence determinants of alternative exons in alternative 5' 
and 3' splicing might extend to exon skipping as well. If this 
were the case, we would expect our model to accurately predict 
the effects of exonic sequence variants on skipped exon-inclu- 
sion levels, even though it was never trained directly on any 
exon skipping data. We tested this hypothesis in the context of 
mutations in several distinct genes that are known to cause Men- 
delian disease by promoting exon skipping (Figure 7A; Table S3). 

First, we compared model predictions to experimental data for 
the SMN1 and SMN2 genes, whose misregulation can lead to spi- 



nal muscular atrophy. Our model correctly 
predicted increased or decreased exon 
7 inclusion in 205/229 (89.5%; Figure 7D) 
variants with experimental data. In 
Figure 7B, we compare predictions 
(increased or decreased exon inclusion) 
to experimental data. To make the plot 
more readable, we only included a single 
SNP at each position. Our model accu- 
rately predicts increased/decreased 
exon inclusion for 20/22 of the plotted 
SNPs. On just the variants with quantita- 
tive data (n = 131), our model explained 
65% of the observed variance (R^ = 
0.65; Figure 7E). The SMN1/2 variants 
that we tested included SNPs, indels, 
and combinations of up to 30 nt changes. 

We then tested our model on variants 
in CFTR, whose misregulation can lead 
to cystic fibrosis. Our model correctly 
predicted increased/decreased exon 
12 inclusion in 19/22 variants (Figure 7D). 
When we only looked at the SNP with 
the largest effect at each position, our 
model accurately predicted increased/ 
decreased exon inclusion for 11/12 
SNPs (Figure 70). Among all the CFTR 
variants, our model explained 60% of the 
observed variance (Figure 7E; = 0.60). 

Next, we tested our model predictions on variants in exon 7 of 
the BRCA2 gene, a tumor suppressor responsible for DNA dam- 
age repair. Mutations in BRCA2 affecting the ability of the pro- 
tein to repair DNA lead to such an increased risk of ovarian and 
breast cancer that patients with these mutations may choose to 
have prophylactic surgery. However, the effect of many variants 
on alternative splicing and hence protein function remain un- 
known, forcing patients and doctors to make clinical decisions 
with limited information. The ability to identify deleterious 
variants computationally can provide valuable information to 
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Figure 7. Predicting the Effects of Exonic 
Variants on Exon Skipping 

(A) The inputs to the splicing model can include 
SNPs, indels, or complex variants within the alter- 
native exon. The splicing model then predicts the 
exon inclusion levels with the variant present. 

(B) Model predictions are compared to experi- 
mental results using RT-PCR for SNPs occurring 
in exon 7 of SMN2. For positions with data for 
multiple SNPs, the SNP with the largest measured 
change in PSI was plotted. The model accurately 
predicted the directional change in PSI (increased 
exon inclusion/exclusion) for 20/22 SNPs plotted. 

(C) Model predictions are compared to experi- 
mental results using RT-PCR for SNPs occurring 
in exon 12 of CFTR. The model accurately pre- 
dicted the directional change in PSI for 11/12 
SNPs plotted. 

(D) The prediction accuracy for variants in SMN2, 
CFTR, and BRCA2 ranged from 86% to 90%. 

(E) The change in PSI is plotted for every variant 
with RT-PCR data. The model explains over 60% 
of the effects of SNPs for variants each gene 
tested (SMN1/2, CFTR, and BRCA2). 

See also Figure S6 and Table S3. 
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patients with these variants of unknown significance. Our model 
correctly predicted increased/decreased exon 7 inclusion for 
31/35 variants that experimentally altered inclusion levels (Fig- 
ure 7D). The model correctly predicted 19/22 of the SNPs with 
the largest effect at each position within the exon (Figure S6B). 
Among all the BRCA2 variants, our model explained 67% of the 
observed variance (R^ = 0.67; Figure 7E). 

We then compared our results to SPANR (Xiong et al., 
2014)— the current state of the art in predicting the effects of 
SNPs on exon skipping. SPANR consists of a Bayesian deep 
learning algorithm trained on exon skipping events in the 
human genome with 1,393 carefully hand-selected features. 
As of this paper, SPANR only supports predictions of SNPs, 
so we were not able to compare our predictions on more com- 
plex variants. Flowever, for SNPS in SMN1/2, CFTR, and 
BRCA2, we found that FIAL accounted for three times more 
of the observed effects than SPANR (FIAL: R^ = 0.51 ; SPANR: 
R^ = 0.17; Figure S6A). We made HAL publicly available 
at http://splicing.cs.washington.edu. All of the code to repro- 
duce this study is publicly available at https://github.com/ 
Alex-Rosenberg/cell-201 5. 

DISCUSSION 

We present a framework based on massively parallel analysis of 
synthetic sequences to dramatically improve our understanding 



of alternative splicing and the ability to 
predict the impact of natural human 
genetic variation. Our model accurately 
predicts the effects of sequence vari- 
ants on alternative 5' splicing that occur 
both within the alternative exon and in 
the competing splice donors. Even more 
importantly, our model learned regulatory rules about alterna- 
tive splicing that generalized to exon skipping — a completely 
different form of alternative splicing than those on which the 
model was trained. 

Our results suggest that a common regulatory mechanism is 
shared between all major forms of alternative splicing. Additional 
evidence for such a common mode of regulation comes from 
previous smaller-scale studies of ESEs or ESSs that have shown 
similar effects across different forms of alternative splicing 
(Wang et al., 2006, 2012). It is unlikely that this shared form of 
regulation occurs during splice site recognition; any exonic 
splice regulatory element that alters splice donor or splice 
acceptor recognition should have different effects in alternative 
5' and 3' splicing events. It is more likely that alternative exon in- 
clusion is modulated during exon definition, that is the pairing of 
splice site across exons, which often precedes the eventual pair- 
ing of splice donors and acceptors across introns (Robberson 
et al., 1990). 

Furthermore, our data also suggest that the exon-defining inter- 
actions between the upstream splice acceptor and downstream 
splice donor are regulated additively. In both alternative 5' and 3' 
splicing, we found the joint effect size of multiple 4-mer to be highly 
correlated with the sum of the individual 4-mer effects. This result 
may indicate that each sequence motif can contribute additively to 
stabilizing the splice acceptor-splice donor interaction, likely 
through the frans-factors that bind these sites. However, the true 
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mechanistic basis for this additivity will require further investiga- 
tion. Although, there is evidence supporting specific examples of 
functional interactions between c/s-splicing regulatory elements 
(Oberstrass et al., 2005), our results indicate that these examples 
are likely uncommon. 

A potential limitation of our approach is that mRNAs are tran- 
scribed from plasmids rather than directly from the genome, 
especially considering evidence suggesting that chromatin 
can influence alternative splicing (Luco et al., 2010). However, 
advances in high-throughput genome editing may make it 
possible to perturb the genome in a massively parallel fashion, 
which will enable extensions of our approach to probe the effects 
of chromatin on alternative splicing. In fact, recent work demon- 
strated that small-scale genomic libraries could be created 
through insertion of degenerate sequences directly into an alter- 
natively spliced gene locus (Findlay et al., 2014). Moreover, our 
current work focused on mini-genes with short alternative exons, 
and more work will be necessary to understand to which extent 
our results generalize to other gene architectures. However, 
human exons are typically short (an average 147 bp for internal 
exons) (IHGSC et al., 2001), and, moreover, analysis of sequence 
conservation suggests that most sequence determinants of 
alternative splicing can be found within a few hundred nucleo- 
tides of intron-exon junctions. It is important to emphasize that 
our approach uncovers only c/s-regulatory rules. Complemen- 
tary experiments that connect this c/s-grammar to a repertoire 
of frans-acting splice factor proteins are necessary to fully un- 
derstand the mechanisms underlying the regulation of alternative 
splicing. 

We have demonstrated that learning the sequence determi- 
nants of gene regulation from large libraries of synthetic se- 
quences can be used as a complementary approach to learning 
directly from the human genome. We assayed over two million 
alternatively spliced constructs, nearly two orders of magnitude 
more events than the 38,000 that are present in the human 
genome (Wang et al., 2008), containing over 1 00 Mb of synthetic 
sequence. Our improved understanding of alternative splicing 
and performance in predicting the effects of genetic variants 
is not a result of more sophisticated machine learning algorithms 
but simply the result of learning from a larger and more reliable 
dataset. We anticipate that this general approach will be 
useful for advancing our biological understanding of diverse 
forms of gene regulation, such as transcription, translation, 
and polyadenylation. 

EXPERIMENTAL PROCEDURES 

Cloning of Degenerate Libraries 

The libraries were assembled with PGR and standard Gibson assembly 
(Gibson et al., 2009) using degenerate oligonucleotides (IDTDNA). First Citrine 
was split into two exons, and the first exon of the Citrine gene was altered to 
remove any potential splice donors, without altering the amino acid sequence. 
The introns with degenerate sequences were inserted between the two exons 
of Citrine. The barcode sequence was inserted into the 3' UTR of Citrine. 

Cell Culture and Transfection 

HEK293 cells were cultured in in DMEM (Cellgro) plus 10% FBS and L-gluta- 
mine/penicillin/streptomycin on coated plates. Plates were coated for 24 hr 
with 8 ml of lOOx diluted extracellular matrix gel (Sigma-Aldrich) before 
HEK293 cells were added to the plates. For transfection of a complex pool 



of plasmids, 1.2 million cells were seeded in a 10-cm dish 24 hr before trans- 
fection. We mixed 10 |ig of the plasmid library in 1 ml of Opti-MEM Reduced 
Serum Medium (Life Technologies) with 30 [i\ of Lipofectamine LTX and 
10 III of Plus Reagent (Life Technologies), before transfecting into the 10-cm 
dish. The DMEM was replaced 5 hr after transfection. 

Isolation of RNA and Generation of cDNA 

Total RNA was extracted using RNeasy (QIAGEN) kits 24 hr after transfection. 
The optional on column DNasel digest was performed with the RNase-Free 
DNase Set (QIAGEN). Total RNA quality and purity was tested by measuring 
the A260/A280 ratio on a NanoDrop 1000 Spectrophotometer and, in some 
cases, by measuring the ratio of the 18S and 28S rRNA bands on a native 
1 % agarose gel. mRNA was separated from 35-48 ^ig total RNA using polyA 
Spin mRNA Isolation Kits (New England Biolabs). Isolated mRNA was again 
digested by DNasel for 30 min using the Turbo DNA-free Kit (Ambion). cDNA 
was then synthesized from 109-374 ng mRNA using MultiScribe Reverse Tran- 
scriptase (Ambion) and Oligo d(T)16 primers (Ambion). cDNA synthesis was 
performed by holding reactions at 25°C for 10 min, 42°C for 110 min, and 
85°C for 5 min. The quality of cDNA and presence of DNA contamination 
were checked through qPCR: Citrine, mCherry, and TBP were compared using 
cDNA, no reverse transcription controls (NRTC), and a no template control 
(NTC). The results indicated that there was no plasmid or genomic DNA carry- 
over into the cDNA reactions. 

Generation of lllumina Flow Cell Compatible PCR Products from RNA 
and DNA Library 

The resultant cDNA was then amplified by PCR to generate products compat- 
ible with the lllumina HiSeq2000 Flow Cell. PCR reactions were performed in 
100 [i\ with 2x Phusion HF Master Mix (New England Biolabs), 50 pmol forward 
primer, and 50 pmol reverse primer with sample specific barcodes and 20% of 
each cDNA reaction. Cycling was done on a BioRad T1 00 Thermal Cycler with 
the following protocol: 98°C for 5 min, then seven cycles of 98°C for 10 s, 
67.5°C for 15 s, 72°C for 30 s, and a final extension step at 72°C for 5 min. 
The necessary number of cycles was determined for each sample by first 
running qPCR reactions with EvaGreen in a Biorad CFX and determining 
when fluorescence began to plateau. Following PCR, 10% of the products 
were run on a 2% agarose gel to determine if the expected bands were pre- 
sent. The remainder of the PCR products was purified using the QIAquick 
PCR Purification Kit (QIAGEN) and eluted into 30 |il of EB. Concentrations, 
as well as A260/280 and A260/230 ratios, were measured on a NanoDrop 
1000 Spectrophotometer. 

Illumina-compatible PCR products were also generated from the DNA 
plasmid library with the same protocol as above, except the cDNA template 
was replaced with 1 0 ng of plasmid library DNA and the PCR reaction was per- 
formed with 20 cycles. 

Sequencing Plasmid Library and RT-PCR Products 

Both the RT-PCR products and plasmid library PCR products were 
sequenced on either an lllumina HiSeq2000 or lllumina MiSeq with paired 
end reads. The forward read crossed the post-splicing exon-exon junction 
and the reverse read covered the 3' UTR barcode. A 6-nt index read was 
used to sequence the sample barcode to determine if the read came from a 
DNA library or a cDNA library. 

Associating Degenerate Intronic Regions with 3^ UTR Barcode Tags 

Using the sequencing results of the DNA plasmid library, we first counted the 
number of reads for every observed barcode and calculated an average Phred 
quality score for each position. We discarded any barcode tags with less than 
two reads or less than an average Phred score of 20 at any position. We then 
mapped each remaining tag to the associated degenerate sequence with the 
most reads. If each degenerate sequence had a single read, we chose the 
sequence with the highest minimum Phred score. 

Measuring Isoform Fractions from Sequencing Results 

For every read on an RT-PCR product, we recorded the splicing position (or 
lack of splicing) by aligning the read to the unspliced plasmid. Using the asso- 
ciated barcode read, we were then able to tally the number of reads splicing 
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at each position for every piasmid in our iibrary. With respect to the aiternative 
5' iibrary, oniy reads that mapped to a spiice donor with GT or GC in the +1 
to +2 intronic positions were counted. 
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SUMMARY 

The organization of a cell emerges from the inter- 
actions in protein networks. The interactome is criti- 
cally dependent on the strengths of interactions and 
the cellular abundances of the connected proteins, 
both of which span orders of magnitude. However, 
these aspects have not yet been analyzed globally. 
Here, we have generated a library of HeLa cell lines 
expressing 1,125 GFP-tagged proteins under near- 
endogenous control, which we used as input for a 
next-generation interaction survey. Using quantita- 
tive proteomics, we detect specific interactions, 
estimate interaction stoichiometries, and measure 
cellular abundances of interacting proteins. These 
three quantitative dimensions reveal that the protein 
network is dominated by weak, substoichiometric in- 
teractions that play a pivotal role in defining network 
topology. The minority of stable complexes can be 
identified by their unique stoichiometry signature. 
This study provides a rich interaction dataset con- 
necting thousands of proteins and introduces a 
framework for quantitative network analysis. 

INTRODUCTION 

Proteins are central protagonists of life at the molecular level. 
They interact for structural, regulatory, and catalytic purposes, 
forming macromolecular structures as well as stable or transient 
multi-protein complexes. Accordingly, protein interactions vary 
greatly in their biophysical properties, while protein abundances 
range from a few to millions of copies per cell. The interactome is 
therefore the product of two factors: binary affinities between 
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protein interfaces (Rual et al., 2005; Stelzl et al., 2005; Rolland 
et al., 2014) and the cellular proteome, which itself is character- 
ized by subcellular localization, post-translational modifications 
and protein concentrations (Hein et al., 2013; Mann et al., 2013). 

Mapping the protein interactome landscape has been a long- 
standing goal of modern biology and a variety of methods have 
been developed to this end (Seebacher and Gavin, 201 1 ). Affinity 
purification followed by mass spectrometry (AP-MS) can in prin- 
ciple determine the members of protein complexes in their 
cellular context in an unbiased manner (Gingras et al., 2007) 
and has enabled large-scale protein interaction studies of 
several model organisms, including human cells (Ewing et al., 
2007; Malovannaya et al., 2011). Nanoscale liquid chromatog- 
raphy (LC) coupled to sensitive and fast mass spectrometers 
has boosted interaction proteomics technology in recent years, 
increasing coverage and minimizing false negative rates. It has 
also enabled a paradigm shift from identification to quantification 
of interacting proteins (Bantscheff et al., 2012). Quantitative ap- 
proaches permit the use of mild immunoprecipitation (IP) proto- 
cols and allow specific binders to stand out by their quantitative 
signature even from very large backgrounds of unspecific pro- 
teins (Mellacheruvu et al., 201 3; Keilhauer et al., 201 5). Addition- 
ally, MS-based proteomics is now able to characterize entire 
cellular proteomes with increasingly complete coverage (Beck 
et al., 2011; Mann et al., 2013), providing abundances and 
copy-number estimates of the expressed proteins. This should 
now allow studying the quantitative interactome as a function 
of the underlying proteome. To generate model systems that 
closely recapitulate in vivo conditions, we have previously devel- 
oped bacterial artificial chromosome (BAG) transgeneomics: 
GFP-tagged proteins are expressed in mammalian cell lines 
from BAG transgenes with near-endogenous expression pat- 
terns from human or orthologous mouse loci (Poser et al., 
2008). GFP-based tags are dual-purpose in that they can be 
used for both imaging and as affinity handle. Gombining these 

CrossMark 





Cell 



COCO 

oooo 



B 



BAG 

library 





1,330 BAC-GFP lines 
1,125 distinct bait proteins 



biological triplicates 
3,990 MS samples 



c 



^ o' 




re c 
^ ■§ 

1 -Q 
re 



specific -► 
interactors 

• co-enrichment 

• correlation 




Intensity 



□ HeLa proteome 

■ interactors 

■ baits 




n.'d. 10^ 10^ 10"^ 10^ 10® 10^ 10^ copies/cell 

10“^° 10“^ 10“^ 10"^ 10"® 10"® 10“^mol/l 



Figure 1. Quantitative BAC-GFP Interactomics 

(A) BAG recombineering workflow for generating transgenic HeLa lines. 

(B) Single-step affinity-purification, single-run liquid chromatography-tandem mass spectrometry (LC- MS/MS) workflow. 

(C) Schematic protein quantification matrices in interactome and proteome samples with three dimensions of quantification. 

(D) Proteome coverage and abundance distribution of the bait proteins and their interactors. 

See also Tables SI , S2, and S3. 



cell lines with the quantitative proteomics workflow resulted in a 
versatile and highly specific method that we termed quantitative 
BAC-GFP interactomics (QUBIC) (Hubner et al., 2010). 

Here, we applied QUBIC in a proteome-wide manner, using 
1 ,1 25 bait proteins to assemble a large-scale map of the human 
interactome. We characterize individual interactions in three 
quantitative dimensions that address statistical significance, 
interaction stoichiometry, and cellular abundances of interac- 
tors. This concept provides a unique perspective on the interac- 
tome, enabling the discovery and characterization of stable and 
transient protein complexes, guiding their functional interpreta- 
tion and shedding light on the topological architecture of the 
entire network. 

RESULTS 

Quantitative BAC-GFP Interactomics 

Collections of strains or cell lines expressing tagged proteins 
are indispensable tools for many systems biology approaches 
(Huh et al., 2003). Expressing GFP-tagged proteins from engi- 
neered BAG transgenes maintains the endogenous promoters, 
intron-exon-structures and regulatory elements, ensuring near- 
endogenous expression levels and patterns (Poser et al., 2008) 
(Figure SI A). We have previously used this system to study 
chromosome segregation and the function of motor proteins 
(Hutchins et al., 201 0; Maliga et al., 201 3). To map the protein in- 
teractome globally, we generated a resource of 1,330 stable 
BAC-GFP HeLa cell lines (Figure 1A; Table SI). Mouse BACs 
are excellent surrogates for their human orthologs and offer 
additional options, such as resistance to RNAi against their 
endogenous counterparts, streamlining functional studies of 
the tagged proteins (Kittler et al., 2005). In 615 cell lines, we 
used mouse BACs with a median sequence identity of 94% 
with their respective orthologs (Figure SIB). Overall, our collec- 
tion encompasses 1 ,125 distinct bait proteins across all protein 
classes (Figures SI C-S1 E), some present as C- and N-terminally 



tagged versions, or as mouse and human sequences (not 
counted as distinct). 

We performed QUBIC in three biological replicate experiments, 
resulting in 3,990 LC-MS runs recorded on an Orbitrap mass spec- 
trometer, taking about a year of net measuring time (Figure 1 B). To 
define specific interactors, we employed MaxLFQ, the label-free 
quantification (LFQ) module of the MaxQuant software (Cox and 
Mann, 2008; Cox et al., 2014). Bait proteins and their interactors 
are characterized by quantitative co-enrichment compared to 
their intensity profiles across many samples (Figures IB and 
1C), and we used generic statistical testing to determine signifi- 
cantly enriched cases (Keilhauer et al., 2015). To set thresholds 
for accepting a given candidate as an interactor, we developed 
an entirely data-driven, false discovery rate (FDR)-controlled 
approach that harnesses the absence of “negative” interactions 
and the concomitant asymmetry of the outlier population (Figures 
SI F and SI G). This approach does not rely on reference datasets 
or prior knowledge for training, but nonetheless validates favorably 
against gold standards (Figures SI I-S1 K). 

In addition to local co-enrichment, we found the intensity pro- 
files of interacting proteins to be closely correlated globally (Sup- 
plemental Experimental Procedures). Profile correlations alone 
can indicate protein interactions when proteome samples are 
subjected to extensive native fractionation (Havugimana et al., 
2012; Kristensen et al., 2012). Here, we use them as additional 
classifiers (Keilhauer et al., 201 5) and the combination of enrich- 
ment FDRs and profile correlation coefficients defines the confi- 
dence class of each interaction (Figure SI H). 

Overall, using the information in this first dimension of proteo- 
mic quantification, our analysis resulted in 28,504 unique and 
statistically significant interactions involving 5,462 distinct pro- 
teins (Table S2). 

Interaction Stoichiometries and Protein Abundances 

A second dimension of quantification can in principle be applied 
to determine the stoichiometries of proteins within complexes. 
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Figure 2. The Stoichiometry Plot 

(A) Overlay of all interaction and abundance stoi- 
chiometry data for all interactions. 

(B) The characteristic triangular shape is a conse- 
quence of the dynamic range limits in the inter- 
actome (left border), in the proteome (top border) 
and the stoichiometry limit imposed by the relative 
cellular protein abundances (diagonal). Schematic 
interaction scenarios: (1) equal cellular abundance, 
stable interaction; (2) equal cellular abundance, 
weak interaction; (3) stable interaction with greater 
cellular abundance of the prey; and (4) reciprocal 
case: quantitative recovery of a stably bound, less 
abundant prey. 

(C) Stoichiometry plot of interactions between proteins annotated as CORUM complex members. The area of highest density can be approximated by a circle 
containing 58% of CORUM interactions. 

See also Data S1 . 
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These can be computationally extracted from label-free affinity 
purification data with accuracies reaching those of methods us- 
ing isotopically labeled reference standards (Wepf et al., 2009; 
Smits et al., 2012). If a protein complex contained one copy of 
each subunit, one might expect them to be retrieved in equimolar 
amounts after immunoprecipitation (IP). However, in practice, 
measured stoichiometries between preys and baits span orders 
of magnitude (Collins et al., 2013; Hauri et al., 2013). This is 
because the observed stoichiometries depend on more than 
the initial composition of the individual complexes in the cell. 
For instance, limited kinetic and thermodynamic stability can 
result in substoichiometric recovery. Proteins may also reside 
in different alternative molecular assemblies with fractions of 
their total cellular pools. Hence, we hypothesized that globally, 
interaction stoichiometries might reflect the stability of a given 
protein-protein interaction and depend on the extent that inter- 
actors are engaged with each other. The cellular abundance 
of an interactor can be limiting for how much is recoverable 
after immunoprecipitation (IP), setting a lower bound for the 
interaction stoichiometry. We therefore reasoned that cellular 
copy numbers would provide a crucial third quantitative 
dimension. 

For each pair of interacting proteins, we first quantified their 
stoichiometry in the immunoprecipitates using a sophisticated 
label-free strategy for absolute protein quantification (Supple- 
mental Experimental Procedures). To determine the precision 
of our method, we systematically compared interaction stoichi- 
ometries from experiments where the same bait proteins were 
tagged on different termini, representing entirely separate exper- 
iments (Figures S2A and S2E). Stoichiometries showed high cor- 
relation, precision within a factor of three, and no systematic bias 
for a given terminus. This confirmed that our approach robustly 
delivers interaction stoichiometries in high throughput; however, 
these may not always be sufficiently accurate to reliably specify 
copy numbers of each subunit, even for stable complexes. We 
repeated the analysis for cases where either the mouse or human 
ortholog of a protein was used as bait, demonstrating the same 
level of reliability and no species bias (Figures S2B-S2D). This 
highlights the extraordinary degree of conservation of protein 
function in evolutionary time (Kachroo et al., 201 5) and suggests 
that our human-centric dataset is representative not only of the 
human but other mammalian species. 



To add a third dimension of proteomic quantification, we next 
performed a whole proteome quantification experiment on the 
parental HeLa cell line that all our BAC-transgenic lines are 
derived from, to a depth of about 9,000 proteins. To estimate 
cellular protein abundances, we applied our label-free approach 
and scaled the values to copies per cell using the “proteomic 
ruler” concept (Wisniewski et al., 2014) (Supplemental Experi- 
mental Procedures). The proteome dataset provided cellular 
copy numbers for 5,305 proteins of the interactome dataset, 
covering 97% of all interactors (Figure ID). The abundances of 
interacting proteins closely follow the distribution of bait abun- 
dances, covering the entire dynamic range of the proteome. 
This demonstrates that our BAC-based system recapitulates 
the in vivo situation, enabling us to probe the interactome as a 
function of the endogenous cellular proteome. 

Quantifying the Interactome in Two Additional 
Dimensions 

Having established a set of specific interactions with the first 
dimension of quantification, we next combined our second and 
third dimension of quantification, namely the interaction stoichi- 
ometries and relative cellular abundances of interactors. A plot of 
the stoichiometry landscape for each bait protein is a powerful 
tool to organize its interactome, because each region reflects a 
different scenario (Figures 2A and 2B): stable, one-to-one, and 
fully recovered complexes in which the partners have equal 
cellular abundance appear around the origin of the plot (case 1 
in Figure 2B). Superstoichiometry, the recovery of more prey 
than bait, is only expected for stable complexes containing 
more prey than bait copies and indeed we find few of these. If in- 
teractions are weak and complexes dissociate partially during 
IP, or if interactions involve only part of the bait pool, interactors 
are recovered at substoichiometric levels (case 2), reflecting 
lower occupancy of interaction interfaces of the bait. A vast pre- 
dominance of sub- over superstoichiometry confirms our initial 
hypothesis that stability and occupancy are the main determi- 
nants for most interactions. 

We observed many cases of stable interactors (~1 :1 interac- 
tion stoichiometry) that involved a more abundant prey (case 
3), such as the interaction of the abundant GTP-binding protein 
RAN with its guanine-nucleotide releasing factor RCC1 or that 
of a-tubulin with the NEK9 kinase (see Table S2). The reciprocal 
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interaction stoichiometry readouts are necessarily smaller than 
one, because any higher abundant bait can maximally recover 
the entire pool of its lower abundant prey (case 4). (Note 
that this would be the default case for overexpressed baits.) 
We retrieved substoichiometric interactions over an estimated 
five orders of magnitude; for example, NEK9 was recovered at 
6 X 10“® X the amount of a-tubulin. The proteome-interactome 
relationship requires that interactors can only be recovered to 
the extent permitted by their abundance and translates into a 
diagonal cut-off in the plot, which results in a characteristic trian- 
gular shape of the “cloud” of interactions (Figure 2A). 

Approximately 10% of our interactions connected members 
of well-characterized complexes annotated in the CORUM 
database (Ruepp et al., 2010). They populate a confined area 
characterized by a signature of balanced stoichiometries (case 
1 in Figure 2B). Thus the prototypical case of a stable protein 
complex as typically described in the literature mostly consists 
of proteins of equal cellular abundances that are all constitutively 
bound to each other. 

Extrapolating from the signature of known complexes, we 
reasoned that deduction of similar complexes should be 
possible solely from the stoichiometry signature of individual 
baits as opposed to analysis of the entire network (Collins 
et al., 2007; Hart et al., 2007). We filtered our data for those 
featuring the core stoichiometry signature (Figure 3B), yielding 



a larger cluster connecting several molecular assemblies such 
as major cytoskeletal proteins, the nuclear pore complex and 
the ribosome as well as 194 isolated putative core complexes 
(Figure 3A). These recapitulated the majority of CORUM-anno- 
tated complexes that involve our bait proteins (Figure 3C). 
We confirmed the known tendency of large complexes to be 
well annotated (Havugimana et al., 2012), while smaller assem- 
blies lacked previous description (Figure 3D). The largest of 
our 125 networks with no database annotation at the time 
is the recently discovered COMMD/CCDC22/CCDC93 (CCC) 
complex (Phillips-Krawczak et al., 2015). 

The stoichiometry plot offers a unique opportunity for com- 
paring the overlap of our dataset with published data (Figures 
S3A-S3C). For instance, the intersection with a recent co- 
fractionation interactome study (Havugimana et al., 201 2) closely 
recapitulated the core-complex signature, with 26% of our core- 
interactions overlapping with that study. This indicates that the 
co-fractionation methodology offers an attractive short-cut to 
finding stable, obligate core complexes. Conversely, the overlap 
with IRefWeb, a portal of consolidated protein interactions from 
different sources (Turner et al., 2010), reached much further into 
the substoichiometric region, beyond stable complexes, but still 
only covered 1 6% of our dataset. Finally, the overlap with recent 
large-scale yeast-two-hybrid data (Holland et al., 2014) was low 
(0.4%) and mostly limited to cases characterized by quantitative 
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prey recovery to the extent permitted by cellular abundance. 
Moreover, the stoichiometry plot quantitatively confirmed the 
intuitive notion that high-stoichiometry interactions are easier 
to detect as they are enriched in the 1 % FDR compared to the 
5% FDR cohort (Figures S3D and S3E). This is also reflected in 
the overlap of gene ontology (GO) annotations in pairs of inter- 
acting proteins (Figure S3F). 

Interactions Explain Phenotypes and Genetic 
Associations 

Our dataset provides an extensive resource that can be mined 
for new or poorly characterized protein interactions. For 
instance, among the interactors of SUCO, one other protein, 
TAPT 1 , stood out by its core stoichiometry signature, suggesting 
a novel, stable complex consisting of these two low abundant in- 
tegral membrane proteins of the ER (Figures 4A, 4B, and S4A- 
S4C). Mutants of their murine orthologs exhibit severe defects 
during skeletal development: Truncation of TAPT1 causes trans- 
formations in the axial skeleton and perinatal lethality (Howell 
et al., 2007), whereas loss of the SUN domain-containing ossifi- 
cation factor SUCO (also known as OPT) impairs postnatal bone 
formation, causing fractures and neonatal death (Sohaskey 
et al., 2010). The latter study linked the phenotype to impaired 
rough ER expansion and consequent failure of osteoblasts to 
secrete collagen required for bone formation. Knockdown of hu- 
man SUCO increased the cells’ resistance against ricin, whose 
toxicity depends on endocytosis and retrograde trafficking to 
the ER (Bassik et al., 2013). Similarly, the yeast ortholog of 
TAPT1, EMP65 (YER140W), is involved in protein folding in the 
ER and shows buffering genetic and physical interactions with 
the SUN domain protein SLP1 (YOR154W) (Jonikas et al., 
2009; Friederichs et al., 2012). We used our interaction method- 
ology on GFP-tagged strains to confirm this complex (Figures 
S4D and S4E). Similarly, we validated the reciprocal interaction 
in the mammalian system using TAPT1 as bait (Figure S4B). 
Together, our findings establish TAPT1-SUCO as the higher 
eukaryote ortholog of SLP1-EMP65: a low abundant ER mem- 
brane complex that is required for normal skeletal development. 

Going beyond stable complexes, we discovered an interaction 
between the anaphase promoting complex or cyclosome (APC/ 
C) and the uncharacterized protein KIAA1 430. The stoichiometry 
plot indicated that KIAA1430 is of lower cellular abundance and 
is not an obligate member of the APC/C, as the partners were 
recovered substoichiometrically at ~1 % of the respective baits 
in reciprocal experiments (Figures 4C and 4D). 

To independently test whether KIAA1430 was indeed a tran- 
sient interactorof the APC/C, we performed a purify-after-mixing 
(PAM)-SII-AC experiment (Wang and Huang, 2008) (Figures 4E 
and 4F; Supplemental Experimental Procedures). We mixed 
differentially SILAC-labeled lysates from tagged and control 
cell lines before the affinity step. Subsequently measured SII_AC 
ratios are indicative of the stability of the interaction, because 
transient interactors exchange dynamically with unbound coun- 
terparts, shifting their ratio toward unity, whereas stable interac- 
tors maintain their label ratio. Our results confirmed that only 
known subunits of the APC/C are stably bound to the core sub- 
unit CDC23. Consistently, only some of them were recovered 
when assayed for binding to KI/\A1430 and with ratios indicating 



a high degree of dynamic exchange. Next, we tested whether 
KI/\A1430 is a substrate of the APC/C by monitoring its levels 
during mitosis and early G1 phase. Unlike known substrates, 
KI/\A1430 levels remained stable (Figure S4F). 

In interphase, a fraction of GFP-tagged KI/\A1430 localized to 
the centrosomes, in particular the centrioles, and was largely 
excluded from the nucleus (Figures 4G, 4H, S4G, and S4H), while 
the APC/C is known to be predominantly nuclear (Kraft et al., 
2003; Hubner et al., 201 0). During mitosis, after nuclear envelope 
breakdown (NEBD), APC/C accumulates on mitotic spindles, 
centromeres, and centrosomes (Kraft et al., 2003; Acquaviva 
et al., 2004), reflecting a partially common localization with 
KI/\A1430. Consistently, we confirmed the APC/C-KI/\A1 430 
interaction in mitotically arrested, but not in interphase cells (Fig- 
ure 41). To functionally investigate the mitotic interaction, we 
used time-lapse microscopy to determine the time cells require 
from NEBD to the onset of anaphase as a function of APC/C ac- 
tivity. KI/\A1430 knockdown resulted in a mild delay that was 
sensitive to reversine, a small molecule inhibitor of the mitotic 
checkpoint kinase MPS1 (Figures 4J and 4K) (Santaguida 
et al., 2010). These findings suggest that the depletion of 
KI/\A1430 activates the spindle assembly checkpoint, thereby 
postponing the activation of the APC/C. Recent reports identi- 
fied the ciliary protein hemingway as the Drosophila ortholog of 
KI/\A1430 (Soulavie et al., 2014) and implicated the APC/C in 
regulating ciliary length and polarity (Ganner et al., 2009; Wang 
et al., 2014). Given that centrioles are common features of cilia 
and centrosomes, our data suggest that in human cells, 
KI/\A1430 recruits a sub-fraction of the APC/C to the centro- 
some to facilitate mitotic progression. 

These examples illustrate how the combination of three quan- 
titative dimensions offers a unique view on the interactions of 
individual proteins that extends beyond their identification and 
facilitates their functional investigation. 

We have compiled this information into an easily usable 
resource, provided as Data SI and available via the IntAct 
database. For each of the 1 ,330 tagged cell lines, we present a 
concise, one-page summary outlining the abundance of the 
bait protein, the co-enrichment and confidence classification of 
candidate interactors along with the stoichiometry plot and the 
predictions of the core complexes. A reading guide is presented 
in Figure S5. 

The Relevance of Substoichiometric Interactions 

Our study revealed that interactions within obligate complexes 
constitute only a small minority of the interactome. We reasoned 
that the majority of remaining interactions should be of a func- 
tionally and conceptually different nature, as indicated by our 
example of the KI/\A1 430-APC/C interaction. 

To investigate the interplay of the different types of interac- 
tions, we interrogated the chaperonin TRiC (also called CCT), 
which is known to act on a large number of client proteins (HartI 
et al., 2011). Its core machinery of eight subunits was clearly 
identified as an abundant obligate complex (Figure 5A) and rep- 
resents a prominent hub in our interactome dataset. Virtually all 
interactors co-enriched with tagged TRiC core subunits were co- 
chaperones, regulatory proteins of the phosducin family or pro- 
teins containing known substrate motifs (Yam et al., 2008) (Table 
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Figure 4. The TAPT1-SUCO Complex and the KIAA1430-APC/C Interaction 

(A) Stoichiometry plot indicates stable TAPT1-SUCO complex. 

(B) Immunofluorescence of TAPT1 and SUCO in HeLa shows ER localizations. 

(C) Stoichiometry plot of APC/C interactors (bait: CDC23). Known core complex members (blue) and KIAA1430 (red) as novel substoichiometric 
interactor. 

(D) Stoichiometry plot of KIAA1430 (red) interactors shows APC/C subunits (blue) as substoichiometric interactors. 

(E) PAM-SILAC ratios plotted as medians of forward triplicate against label-swapped reverse triplicate (bait: CDC23). 

(F) PAM-SILAC data using KI/\A1430 as bait. Baits and stable interactors are recovered at ratios corresponding to label incorporation levels. Ratios of transient 
interactors are shifted toward 1:1. 

(G) Maximum intensity projections of living interphase and mitotic cells expressing KI/\A1 430-LAP and histone 3.1-iRFP indicate that KI/\A1430 localizes to 
centrioles. 

(H) Co-localization of KI/\A1430 with centrosomal marker y-tubulin and the centriolar protein CEP135. 

(I) Western blot analysis of ANAPC3 IPs and corresponding flow-throughs (FT) from interphase and mitotically arrested cells expressing KI/\A1 430-LAP. 

(legend continued on next page) 
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Figure 5. The TRiC Interactome Is Defined by Substoichiometric Links 

(A) Stoichiometry piot of CCT3 interactors, representative of TRiC core subunits. 

(B) PAM-SiLAC resuits from the same bait protein. 

(C) Reciprocai stoichiometry piot of averaged positions of the TRiC subunits from aii bait puii-downs enriching at ieast three TRiC subunits. Symboi size indicates 
profiie correiation. See aiso Tables S4 and S5. 

(D) Systematic comparison of interaction stoichiometries and PAM-SILAC ratios for all interactions observed using CDC23, KIAA1430, and CCT3 as baits. 



S4). Characteristic of all was a lower cellular abundance than 
TRiC (except for some cytoskeletal proteins) and substoichio- 
metric recovery, classifying these interactors as distinct from 
the core subunits. When we performed a PAM-SILAC experi- 
ment to test for stable versus transient binding, the core 
complex composition that we had already established by the 
stoichiometry plot was confirmed, as these were all found 
to be stable binders (Figure 5B). Other interactors were tran- 
sient, as their SILAC ratios indicated full dynamic exchange. 
Notable exceptions were some regulatory proteins and abun- 
dant cytoskeletal substrates, whose ratios lay between stable 
and fully dynamic binders. We consistently found the unchar- 
acterized protein FAM203A/B as a substoichiometric inter- 
actor with intermediate dynamic exchange behavior. Its ortholog 
in Caenorhabditis elegans shows a cytoskeletal knockdown 
phenotype (Fievet et al., 2013). All of this was reminiscent of 
phosducin proteins, which TRiC requires to fold actin and tubulin 
(Flayes et al., 2011) and we therefore speculate that FAM203A/B 
might have a similar function. 

In reciprocal interaction experiments, TRiC core complex 
members were co-enriched by ~5% of all bait proteins (Fig- 
ure 5C; Table S5). This is in line with estimates of TRiC being 
involved in folding of 5%-10% of the proteome (Marti et al., 
2011). However, only some of these baits were also found in 
the reciprocal TRiC IPs. This asymmetry can be explained with 
knowledge of the underlying proteome: At 1 .3 million copies of 
the hexadecameric complex, TRiC is much more abundant 
than most substrates, of which only a fraction will be in the pro- 
cess of folding at any given time (Table S3). Consequently, only a 
minute fraction of the TRiC pool will be acting on each substrate 



and its recovery will be “diluted” to substoichiometric levels. In 
the reciprocal case, however, TRiC occupies a significant frac- 
tion of the client protein population— the fraction in the state of 
folding — rendering the interaction more readily detectable within 
the dynamic range (Figure 5C). 

The stoichiometry of TRiC recovery in the substrate IPs ranges 
from less than 10“^ to above 10“^ (Figure 5B). With TRiC sub- 
strates comprising 5%-10% of all protein molecules (a HeLa 
cell contains an estimated at 6 x 1 0® protein molecules), our stoi- 
chiometry data imply that on average 0.2%-0.4% of them are 
bound to the chaperone at any time. While substoichiometry 
may be thought to be of lower biological relevance, these inter- 
actions fulfil important functions as they connect very diverse 
set of protein classes. Moreover, our data also illustrate how 
the proteome-interactome relationship balances the amount of 
TRiC with the cumulative amount of its substrates. 

Extrapolating from our APC/C, KIAA1430, and TRiC case 
studies, we investigated whether the different stoichiometric 
classifications of interactions carry over to other characteristics: 
first, we systematically compared interaction stoichiometries 
with dynamic exchange data for all interactions for which both 
orthogonal pieces of information were available (Figure 5D). 
There was almost perfect congruence of stoichiometric interac- 
tors with kinetically stably bound proteins and a surprisingly 
good overall correlation of substoichiometric recovery and the 
extent of dynamic exchange. This indicates that interaction stoi- 
chiometries are globally predictive of the biophysical stability of 
an interaction. Next, we investigated whether interaction stoichi- 
ometry is indicative of co-expression across tissues or cell types. 
We extracted protein abundance correlation profiles across 



(J) Western analyses showing the extent of KIAA1430 depletion before and after the time-lapse analyses presented in (K). 

(K) Time KIAA1 430-depleted cells require to proceed from NEBD to anaphase, compared to control cells (n = 300 each); 0.5 ^iM reversine rescues the delay (n = 
200 each). Red lines, mean. Significance according to two-tailed Mann-Whitney test. Scale bars, 10 laM. 

See also Figure S4. 
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Figure 6. Strong and Weak Interactions Have Different Global Properties 

(A) Sub-network of complexes surrounding RNA polymerases l/ll/lll. Proteins are colored by complex, edges by profile correlation, edge widths represent 
interaction stoichiometries. 

(B) Effect of sequential removal of substoichiometric interactions on network sizes. Indicated are points where edge removal results in two fragmented sub- 
networks. 

(C) Global network effect of random or targeted removal of interactions on the total number of isolated sub-networks. 

(D) Effect on the number of proteins present in the largest entirely connected sub-network. 

(E) Effect on the fraction of total connected proteins that are part of this largest sub-network. 

See also Figure S6. 



many tissues from a recent human proteome draft dataset 
(Kim et al., 2014). While co-expression coefficients scattered 
widely, there was still a notable relationship with interaction 
stoichiometry, with high-stoichiometry interactors more likely 
to be coherently expressed (Figure S6A). This is in agreement 
with earlier findings in yeast showing that members of stable 
complexes are enriched in co-regulated modules (Simonis 
et al., 2006). Conversely, substoichiometric interactions involve 
proteins that are not necessarily tightly co-regulated. 

Finally, we tested whether interaction stoichiometry is predic- 
tive of the role of an interaction in network topology. We analyzed 
a sub-network of interactors surrounding RNA polymerases I, II, 
and III, recapitulating shared subunits and interactions with other 
complexes, such as general transcription factor complexes, 
the negative elongation factor (NELF) complex, the mediator 
complex, and the polymerase-associated factor (PAF) complex 
(Figure 6A). Sequential in silico removal of the most substoichio- 
metric interactions from the network leads to fragmentation 
events, in which the individual complexes gradually lose their in- 
terconnections and emerge as individual modules (Figure 6B). 
Finally, the three polymerases remain internally connected via 
their shared subunits. Removing interactions in the reverse order 
does not lead to any network fragmentation, but rather results in 
roughly linear shrinkage of the network (Figure S6A). 



Taking this approach to a global level, we probed the response 
of our entire network to the removal of edges according to their 
stoichiometry characteristics. Seminal studies on the topology of 
networks have shown that scale-free networks are resilient to 
random removal of edges, but sensitive to targeted attacks (Al- 
bert et al., 2000). Specifically, analysis of the network structure 
identifies the topologically most critical edges, removal of which 
leads to rapid network fragmentation. 

In our case, we targeted edges for removal solely by their 
“local” interaction stoichiometry readout, agnostic to their global 
network roles. We removed edges sequentially, starting at either 
the lowest or highest interaction stoichiometry, comparing this 
with random removal of edges. 

This revealed vastly different network responses (Figure 6C). 
The most substoichiometric interactions turned out to be most 
critical for network topology: Their preferential removal led to a 
rapid increase of the number of isolated network fragments, 
whereas removing the strongest 50% of edges hardly resulted 
in any network fragmentation (Figure 6C). The largest connected 
component, which causes the typical “hairball” appearance of 
large-scale networks, shrunk about linearly with removal of 
weak interactions (Figure 6D) and also left more proteins entirely 
unconnected (Figure S6A). Conversely, preferential removal of 
edges from the other end of the stoichiometry scale led to a 
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network response that increased its small-world characteristics: 
the largest network encompasses the vast majority of connected 
proteins (Figure 6E), fewer proteins are left without connections 
(Figure S6B) and isolated network fragments are smaller (Fig- 
ure S6C). Similar patterns of network response were observed 
in a study analyzing mobile phone communication networks by 
removing the strongest versus the weakest interactions (Onnela 
et al., 2007). In analogy to that study, and based on our findings 
of the relationship between interaction stoichiometries and ki- 
netic stabilities (Figure 5D), we propose a strong/weak terminol- 
ogy for interaction stoichiometries and term interactions with 
near-stoichiometric recovery of the prey “strong” and substoi- 
chiometric interactors “weak.” 

Together, our analyses show that interaction stoichiometries, 
which are local properties derived from single interaction exper- 
iments, predict the global behavior of the proteins involved: 
strong interactions are indicative of proteins that are co-regu- 
lated across cell types. In the network, they form modules of 
high interconnectivity, rendering the network topologically resil- 
ient to their removal. Weak interactions, on the other hand, domi- 
nate the network both in numbers and by their topologically 
critical role as long-range interactions between more diverse 
sets of proteins. As a consequence, interaction networks can 
be fragmented into individual, defined modules, by identifying 
and removing weak links. In summary, availability of interaction 
stoichiometries on a global scale effectively allows us to 
“comb” the interactome hairball, to identify modules, and visu- 
alize their interconnectedness. 

DISCUSSION 

Here, we have introduced a novel concept of interactome anal- 
ysis. Using an efficient, low-stringent IP protocol, accurate 
label-free quantification of both the IPs and the complete prote- 
ome, we extracted three quantitative dimensions, all of which 
proved critical for characterizing protein interactions. While the 
first dimension identifies statistically significant interactions, 
the second and third dimension define their stoichiometric 
contexts. Earlier large-scale studies did not include or interpret 
these additional dimensions, in part because of the challenges 
involved in extracting accurate quantitative values. Moreover, 
past studies often employed overexpression of bait proteins, 
precluding meaningful stoichiometry readout (Gibson et al., 
2013), and near-complete proteome coverage was also often 
not attainable. 

Finding stable protein complexes is usually a major goal 
of interactomics studies. We showed that obligate protein 
complexes feature a unique signature of balanced stoichiome- 
tries— an infrequent occurrence among the multitude of inter- 
actions. Such a signature led us to discover the TAPT1-SUCO 
complex in the ER membrane. This complex ties together a 
body of available evidence, including knockout phenotypes of 
both TAPT 1 and SUCO and genetic interactions of their yeast or- 
thologs. As a representative of the majority of weaker, non-obli- 
gate interactions, we characterized the binding of KIAA1430 
to the APC/C, suggesting that low interaction stoichiometries 
are the result of an interaction that is limited to centrioles in 
mitotic cells and biophysically weaker than interactions between 



APC/C core members. Furthermore, our stoichiometry-based 
classification subdivided the interactome of the TRiC chaperonin 
into obligate core complex subunits, regulatory interactors, and 
a large number of substrates. We found that lack of reciprocal 
verification can be indicative of an inherently asymmetric nature 
of biologically relevant interactions, particularly outside obligate 
core complexes. This example also illustrates how the observed 
interactome is shaped by protein abundances and, conversely, 
implies overall regulation of protein abundances by protein inter- 
actions. Therefore, the interactome always has to be interpreted 
as a function of the underlying proteome. 

We have shown that interaction stoichiometries generally 
correlate with the biophysical stability of an interaction. Weak in- 
teractions have frequently gone undetected in interactome 
studies and may be thought to be less important; nevertheless 
they are crucial features of networks in general and social net- 
works in particular (Granovetter, 1973; Csermely, 2006). Our 
study directly and quantitatively demonstrates the predominance 
of weak interactions in the protein interactome. MS-based 
methods cover more than four orders of magnitude of interaction 
stoichiometry (Collins et al., 2013), and our low-stringency 
biochemical workflow ideally harnesses this sensitivity. However, 
substoichiometric interactions involving low abundance preys 
can still be challenging to detect (Figures S3D and S3E). There- 
fore, the prevalence of weak interactions is likely to be even 
more pronounced and their relevance vastly underappreciated. 

Previous studies typically counted all interactions as equal, 
once they had been accepted based on their statistical parame- 
ters or scores. Therefore, the roles of individual interactions had 
to be predicted from prior knowledge or from global network 
properties. Highly connected proteins were described as inter- 
action hubs, regions of high clustering coefficients with many 
shared pathway annotations were characterized as complexes 
(Collins et al., 2007; Hart et al., 2007), and weak interactions 
were inferred from weaker connectivity patterns (Malovannaya 
et al., 2011). However, limited coverage of the interactome is a 
confounding factor for such strategies. 

In contrast, we here have shown directly that local stoichiom- 
etry data reflect global network topological properties of interac- 
tions, setting the stage for quantitative network analysis from the 
ground up. 

Substoichiometric interactions form the “glue” that holds the 
cellular network together— as shown specifically for the RNA po- 
lymerase network and globally for the entire network— and are 
hence critical for network structure. This property, which may 
seem counterintuitive at first, prompted us to propose interac- 
tion stoichiometry as a measure of interaction strength. Of 
note, a range of underlying mechanisms can cause a weak inter- 
action according to this terminology, for instance low biophysical 
affinity, high kinetic exchange rates, limited spatiotemporal over- 
lap of interactors, or indirect interactions that are bridged via 
other biomolecules, all of which may result in a substoichiometric 
readout. If such weak links are removed from the network, it col- 
lapses into defined modules that are tightly interconnected by 
the remaining strong links. Translated into biological terms, sta- 
ble complexes would remain in isolation, but without weak links, 
they would not be able to connect to each other or to transient, 
dynamic regulators. 
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A major contribution of this study lies in the characterization of 
the interactomes surrounding more than 1,100 different baits, 
which together cover a large part of the expressed proteome 
with more than 28,000 interactions. We present our results in 
an accessible format that can be easily mined and interpreted 
by non-specialists. Our resource of mammalian cell lines ex- 
pressing GFP-tagged proteins under endogenous control can 
be employed for other studies (e.g., focusing on subcellular local- 
ization or functional characterization of individual proteins). The 
interaction data validate these cell lines for such uses and the 
use of mouse orthologs as surrogates. They also imply remark- 
ably similar protein interactomes between human and mouse. 

We approach saturation with respect to the number of proteins 
that can be covered (Figure S6D), but observe only part of the 
entire interactome directly, which our data predicts to encom- 
pass between 80,000 and 180,000 detectable interactions in 
FleLa (Figure S6E). Our additional quantitative dimensions may 
prove helpful for increasing interactome coverage in silico, for 
example, by selective matrix expansion (Seebacher and Gavin, 
2011). Given its usefulness in interpreting interaction data, the 
stoichiometry readout developed here can become a general 
basis for future interactome studies and for the analysis of inter- 
actome dynamics, which will manifest foremost as quantitative 
alteration of occupancies rather than qualitative gain or loss of 
interactors. 

EXPERIMENTAL PROCEDURES 
Cell Culture 

HeLa Kyoto cell lines expressing N- or C-terminally tagged proteins from 
BAG transgenes were generated, cultured, and imaged as previously 
described (Poser et al., 2008). Tags are based on the “localization and affinity 
purification” (l_AP) tag, consisting of GFP and a functionalized linker. All BAG 
cell lines and tag sequences are listed in Table S1 along with proteome and 
interactome metadata on the bait proteins. Gells were grown to near-conflu- 
ency on two 15-cm cell culture dishes per interaction experiment, detached 
with Accutase, and snap frozen. Three replicates were harvested in at least 
two different passages. 

Affinity Purification and Mass Spectrometry 

Gell pellets were lysed and subjected to affinity purification on a robotic sys- 
tem, followed by single-shot mass spectrometric analysis on an Orbitrap in- 
strument (Hubner et al., 201 0). We processed triplicates separately on different 
days and carried out MS-analyses in randomized order over the course of 
weeks to months. 

Whole Proteome Measurements 

HeLa cells were lysed in guanidinium chloride lysis buffer and digested 
sequentially with LysG and trypsin as described (Kulak et al., 2014). Peptides 
were desalted on stacked G18 reverse phase (Waters Sep-Pak) and strong 
cation exchange cartridges and eluted using 70% acetonitrile. Pooled eluates 
were separated into six fractions on strong anion exchange (SAX) StageTips 
(Wisniewski et al., 2010). MS measurements were performed in three repli- 
cates on a quadrupole Orbitrap mass spectrometer (Kulak et al., 2014). 

Data Processing 

Raw files were processed with MaxQuant (Gox and Mann, 2008) (version 
1.3.9.10) in several sets, each containing --600 randomly assigned AP-MS 
runs and the HeLa proteome fractions. Tandem mass spectrometry (MS/ 
MS) spectra were searched against a modified version of the November 
2012 release of the UniProt complete human proteome sequence database. 
For each bait protein expressed from a mouse BAG locus, the human 
sequence in the fasta file was concatenated with the mouse sequence (unless 



identical). Human identifiers were used for mapping purposes. We used 
MaxLFQ, MaxQuant’s label-free quantification (LFQ) algorithm to calculate 
protein intensity profiles across samples (Gox et al., 2014). We required one 
ratio count for each pairwise comparison step and activated the FastLFQ 
setting with two minimum and two average comparisons to enable the normal- 
ization of large datasets in manageable computing time. 

Detection of Protein Interactions 

Protein identifications were filtered, removing hits to the reverse decoy data- 
base as well as proteins only identified by modified peptides. We required 
that each protein be quantified in all replicates from the AP-MS samples of 
at least one cell line. Protein LFQ intensities were logarithmized and missing 
values imputed by values simulating noise around the detection limit. For 
each protein, a non-parametric method was used to select a subset of sam- 
ples that provide a distribution of background intensities for this protein (Sup- 
plemental Experimental Procedures). This subset was used first to normalize 
all protein intensities to represent relative enrichment and then to serve as 
the control group for a two-tailed Welch’s t test. Specific outliers in the volcano 
plots of logarithmized p values against enrichments were determined by an 
approach making use of the asymmetry in the outlier population (Figures 
S1E and S1F). We used two cut-offs of different stringencies, representing 
1 % and 5% of enrichment false discovery rate (FDR), respectively. Gorrelation 
coefficients between the intensity profiles of interacting proteins were calcu- 
lated as additional quality parameters (Keilhauer et al., 2015). Enrichment 
FDR (classes A-G) and profile correlation (modifier + or -) define the confi- 
dence class of an interaction (Figure S1G). 

Interaction Stoichiometries and Ceiiuiar Copy Numbers 

Estimating interaction stoichiometries requires the comparison of the amounts 
of different proteins relative to each other in one IP. We first subtracted the me- 
dian intensity across all samples to account for background binding. We then 
divided LFQ intensities by the number of theoretically observable peptides for 
this protein (Schwanhausser et al., 2011). Finally, we expressed stoichiome- 
tries relative to the bait protein. Gellular copy numbers and abundances 
were calculated using a similar approach (Wisniewski et al., 2014) on the whole 
proteome data and brought to absolute scale by normalization to a total pro- 
tein amount of 200 pg in a cell volume of 1 pi for a HeLa cell. 

Network Analyses 

Network analyses were performed based on the data listed in Table S2. For the 
purpose of counting unique interactions and for the histogram of the numbers 
of interactors, we regarded interactions as non-directional, flattened multiple 
protein groups mapping to the same gene name and to the most abundant iso- 
form and considered interactions found multiple times only once. For network 
perturbation analyses, we selected all non-self-interactions of confidence clas- 
ses A+, A, and B+ and assembled them into graphs. We then removed edges 
sequentially according to their interaction stoichiometry readout. Prey-bait 
combinations discovered multiple times were treated as separate edges. 
Qnce a protein had lost all its edges, it was removed. As control, we deleted 
edges randomly and represented the median of 100 random repetitions and 
represent the scatter as the first or third quartile ±1 .5 interquartile ranges. 

ACCESSION NUMBERS 

The accession number for the protein interaction data reported in this paper, 
submitted through IntAct (Orchard et al., 2014), to the IMEx Consortium 
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SUMMARY 

In CFTR, the chloride ion channel mutated in cystic 
fibrosis (CF) patients, pore opening is coupled to 
ATP-binding-induced dimerization of two cytosolic 
nucleotide binding domains (NBDs) and closure to 
dimer disruption following ATP hydrolysis. CFTR 
opening rate, unusually slow because of its high- 
energy transition state, is further slowed by CF muta- 
tion AF508. Here, we exploit equilibrium gating of 
hydrolysis-deficient CFTR mutant D1 370N and apply 
rate-equilibrium free-energy relationship analysis to 
estimate relative timing of opening movements in 
distinct protein regions. We find clear directionality 
of motion along the longitudinal protein axis and 
identify an opening transition-state structure with 
the NBD dimer formed but the pore still closed. 
Thus, strain at the NBD/pore-domain interface, the 
AF508 mutation locus, underlies the energetic bar- 
rier for opening. Our findings suggest a therapeutic 
opportunity to stabilize this transition-state struc- 
ture pharmacologically in AF508-CFTR to correct 
its opening defect, an essential step toward restoring 
CFTR function. 

INTRODUCTION 

The cystic fibrosis (CF) transmembrane conductance regu- 
lator (CFTR) is the chloride ion channel mutated in patients 
suffering from CF, a devastating multiorgan disease (O’Sullivan 
and Freedman, 2009). The majority (~90%) of CF patients carry 
at least one allele with a deletion of phenylalanine 508 (AF508). 
The AF508 mutation severely impairs both surface expression 
(Cheng et al., 1990) and chloride channel function (Miki et al., 
2010) of CFTR. Even if efforts to promote trafficking to the sur- 
face prove successful, understanding the molecular mechanism 
of the functional defect in AF508 CFTR will still be essential for its 
correction in CF patients. 

CFTR belongs to the family of ATP binding cassette (ABC) 
transporters (Riordan et al., 1989), which are built from two ho- 
mologous halves, each comprising a transmembrane domain 
(TMD) (Figures 1 A and 1 B, gray) followed by a cytosolic nucleo- 
tide binding domain (NBD) (Figures 1 A and 1 B, green and blue). 
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In CFTR, these two halves (TMD1-NBD1 and TMD2-NBD2) are 
linked by the unique cytosolic regulatory (R) domain (Figure 1 A, 
top, magenta), a target for phosphorylation by cAMP-dependent 
protein kinase (PKA) (Riordan et al., 1989); R-domain phosphor- 
ylation is a prerequisite for CFTR chloride channel activity (Tab- 
charani et al., 1991). 

Opening and closing (gating) of the CFTR chloride ion pore, 
formed by its TMDs, is coupled to a conserved ATP binding/ 
hydrolysis cycle at the NBDs (Figure IB). In ABC proteins, ATP 
binding triggers association of the two NBDs into a stable 
head-to-tail dimer that occludes two molecules of ATP (Fig- 
ure 1B, yellow circles) at the interface (Smith et al., 2002). By 
forming strong interactions with conserved residues of both 
the Walker motifs in the head of one NBD and the signature 
sequence in the tail of the opposing NBD, these ATP molecules 
act as molecular glue that ties the NBDs together: prompt dimer 
disruption therefore requires ATP hydrolysis (Moody et al., 2002). 
In CFTR, only the composite binding site formed by Walker A 
and B motifs of NBD2 and the signature sequence of NBD1 
(site 2; Figure IB, upper site) is catalytically active; the other 
interfacial binding site (site 1 ; Figure 1 B, lower site) is degenerate 
and keeps ATP bound and unhydrolyzed throughout several 
NBD dimerization cycles (Aleksandrov et al., 2002; Basso 
et al., 2003). In ABC exporters, NBD dimer formation flips the 
TMDs to an outward-facing orientation, while dimer disruption 
following ATP hydrolysis and substrate release resets them to 
inward-facing; NBD-to-TMD signal transmission is mediated 
by an interface that includes four short “coupling helices” 
(CHI -4) (Locher, 2009) in TMD intracellular loops (Figure 1 A, vi- 
olet loops). In CFTR, NBD dimer formation initiates a burst of 
pore openings interrupted by brief closures, while dimer dissoci- 
ation terminates the burst and returns the TMDs into a long- 
lasting nonconducting (interburst) state (Vergani et al., 2005). 
Functional studies confirm that in the bursting (“open”) state 
CFTR’s TMDs resemble the outward-facing, whereas in the in- 
terburst (“closed”) state they resemble the inward-facing 
conformation of ABC exporter TMDs (Bai et al., 2011; Cui 
et al., 201 4; Wang et al., 201 4a). For wild-type (WT) CFTR gating 
is a unidirectional cycle: most openings are terminated by ATP 
hydrolysis (Figure 1 B; step Oi ^ O 2 ) rather than by far slower 
non-hydrolytic closure (Figure 1 B; step Oi ^ Ci; rate koc) (Csa- 
nady et al., 2010). 

The major functional defect of AF508 CFTR is a >40-fold 
reduction in opening rate (Miki et al., 2010; Kopeikin et al., 
2014), which reflects destabilization — relative to the closed 
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Figure 1. Choice of a Suitable Background 
Construct for REFER Analysis 

(A) Domain organizations of WT (top) and cut- 
AR(D1370N) (bottom) CFTR: TMDs (gray), intra- 
ceiiuiar ioops containing coupiing heiices (iight 
vioiet), NBD1 (green), NBD2 (biue), R domain 
(magenta), membrane (yeiiow). Coiored circies 
identify target positions. The D1370N mutation in 
NBD2 is depicted by a red star. 

(B) Cartoon representation of the CFTR gating cycie. 
ATP-bound ciosed channeis (Ci) open to a prehy- 
droiytic open state (Oi). During most openings 
ATP is hydroiyzed at composite site 2 (state O 2 ), 
prompting NBD dimer dissociation and pore ciosure 
(to state C 2 ) foiiowed by ADP-ATP exchange (to state 
Ci). Coior coding as in (A). ATP, yeiiow circies; ADP, 
orange crescents. The D1 370N mutation abrogates 
ATP hydroiysis (red cross) and confines gating in 
saturating ATP to a simpie Ci Oi equiiibrium (red 
box). Target positions coior coded as in (A). 

See aiso Figures S1 and S2. 



State— of the transition state for channel opening (step Ci ^ Oi ; 
Figure 1B; rate kco)- Insight into the dynamics of this opening 
conformational change and the structure of its transition state 
would be a key step forward in understanding the molecular 
mechanism of the AF508 gating defect. 

Transition states, which determine the rates of functionally 
relevant conformational movements, are the highest-energy, 
shortest-lived conformations of proteins. For instance, for ion 
channels closed open conformational transitions are so fast 
that they appear as single steps even in the highest time-resolu- 
tion recordings, implying that the time the channel protein 
spends in the transition state itself is on the sub-microsecond 
scale— in contrast to the long (milliseconds-seconds) inter- 
vals spent in comparatively stable open and closed ground 
states observable in single-channel recordings (Figures 2A, 
3A, 4A, and 5A). Intractable by standard structural biological 
approaches, transition-state structures can be studied using 
rate-equilibrium free-energy relationship (REFER) analysis, 
which reports on the relative timing of movements in selected 
protein regions during a conformational transition, such as a 
channel opening step (Zhou et al., 2005; Auerbach, 2007). Struc- 
tural perturbations (typically point mutations) in a given channel 
region often change channel open probability (Pq) by affecting 
the open-closed free-energy difference, but the extent to which 
the free energy of the barrier that must be traversed in opening or 
closing the channel is affected depends on how early or late that 
region moves. A region that moves early during opening will have 
already approached its open-state conformation in the overall 
transition state: a perturbation here thus affects the stability of 
the transition state to an extent similar to that of the open state 
and so impacts only opening, but not closing, rate. In contrast, 
a region that moves late during opening is still near its closed- 
state conformation in the transition state: a perturbation here 
that affects open-state stability thus does not affect transition- 
state stability and so impacts only closing, but not opening, 
rate. This relative timing of motion of a given region of the chan- 
nel protein is reported by its ^ value, the slope of a log-log plot of 
opening rate (kco) versus equilibrium constant (Keq = Pq/(1 - Po)) 



for a series of structural perturbations (Bronsted plot): a large ^ 
value (close to 1) indicates early movement, and a small value 
(close to 0) indicates late movement. 

REFER analysis has been extremely fruitful in mapping gating 
dynamics of the nicotinic acetylcholine receptor channel (Mitra 
et al., 2005; Purohit et al., 2007, 2013, 2015) but is applicable 
only to equilibrium mechanisms (Csanady, 2009), unlike that of 
CFTR. This drawback has so far hampered insight into the 
CFTR opening transition state. Here, we exploit a catalytic site 
mutation that abolishes ATP hydrolysis and so truncates the 
CFTR gating cycle to an equilibrium process. In this non-hydro- 
lytic background, we employ the REFER technique to address 
the relative timing of movements within the sub-microsecond 
process of pore opening in three spatially distant positions 
distributed along the longitudinal axis (cytoplasmic to extra- 
cellular) of the CFTR protein. Our results identify a conforma- 
tion-change wave with clear directionality and provide direct 
measurements that outline the global structure of the CFTR 
opening transition state. 

RESULTS 

Choice of a Background Construct Suitable for REFER 
Studies 

ATP hydrolysis in ABC proteins is destroyed by mutations of the 
Walker B aspartate (Urbatsch et al., 1998; Hrycyna et al., 1999; 
Rai et al., 2006) that coordinates Mg^"^ at each active site 
(Hung et al., 1998; Hopfner et al., 2000). To make CFTR gate 
at equilibrium, we introduced the NBD2 Walker-B mutation 
D1370N (Figure 1A, bottom, red star) because, among several 
hydrolysis-disrupting mutations tested, D1370N only slightly re- 
duces the apparent affinity for ATP, and does not prolong open 
bursts to an extent incompatible with single-channel gating anal- 
ysis (Csanady et al., 2010). 

PKA- and ATP-dependent regulation of CFTR gating are inter- 
twined, and the mechanism of R-domain action is poorly under- 
stood: evidence exists for its direct interaction with both NBDs 
and TMDs (Wang et al., 2002; Bozoky et al., 2013). Thus, 
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Figure 2. Timing of Motion at Position 1246 of the NBD1-NBD2 
Interface 

(A) Inward single-channel currents of the cut-AR(D1370N) CFTR background 
construct (top trace) and of channels bearing mutations T1246V, T1246P, 
T1246C, T1246N, and T1246A, respectively, in the same background. Cur- 
rents were recorded at -80 mV, in symmetrical 1 40 mM Cl“; dashes on the left 
mark zero-current level. 

(B-D) Mean burst (B, Xb) and interburst (C, Tjb) durations and open probabilities 
(D, Po) of the six constructs in (A). Red horizontal lines highlight the respective 
control values of the background construct. All data are shown as mean ± 
SEM (n = 4-28). 

(E) Bronsted plot for position 1246. Gray symbol identifies the background 
construct (also in Figures 3E, 4E, and 5E). Solid line is a linear regression fit with 
slope 4) indicated. 

See also Figures SI and S2. 

changes in gating kinetics caused by perturbations of a target 
position might potentially reflect altered R-domain/target posi- 
tion interactions, rather than energetic effects on ATP-depen- 
dent conformational transitions. Such confounding effects are 
absent in channels lacking the R domain: cut-AR channels, ob- 
tained by coexpression of TMD1-NBD1 (residues 1-633) and 
TMD2-NBD2 (residues 837-1480), do not require phosphoryla- 
tion to be active, while ATP-dependent gating remains similar 
to WT (Csanady et al., 2000). 



Figure 3. Timing of Motion at Position 275 of the NBD2-TMD 
Interface 

(A) Inward single-channel currents of the cut-AR(D1370N) CFTR background 
construct (top trace) and of channels bearing mutations Y275F, Y275E, Y275K, 
Y275L, and Y275S, respectively, in the same background. Currents were 
recorded at -80 mV, in symmetrical 1 40 mM Cl“; dashes on the left mark zero- 
current level. 

(B-D) Mean burst (B, Xb) and interburst (C, Xjb) durations and open probabilities 
(D, Po) of the six constructs in (A). Red horizontal lines highlight the respective 
control values of the background construct. All data are shown as mean ± 
SEM (n = 9-28). 

(E) Bronsted plot for position 275. Solid line is a linear regression fit with 
slope 4) indicated. 

See also Figures SI and S2. 



Thus, we chose cut-AR(D1370N) as the background 
construct for our REFER study (Figure 1A, bottom). Gating of 
cut-AR(D1370N) indeed proved PKA-independent but remained 
strictly ATP-dependent with an apparent affinity for ATP of 288 ± 
27 |iM (Figures SI A and SIB). Just as for WT (Winter et al., 
1994; Zeltwanger et al., 1999; Csanady et al., 2000; Vergani 
et al., 2003), cut-AR (Csanady et al., 2000; Bompadre et al., 
2005), and D1370N (Vergani et al., 2003) CFTR channels, 
mean open burst duration (jb) of cut-AR(D1370N) proved largely 
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Figure 4. Timing of Motion at Position 348 in the Pore Region 

(A) Inward single-channel currents of the cut-AR(D1370N) CFTR background 
construct (top trace) and of channels bearing mutations M348I, M348K, 
M348C, M348N, and M348A, respectively, in the same background. Currents 
were recorded at -80 mV, in symmetrical 1 40 mM Cl“; dashes on the left mark 
zero-current level. 

(B-D) Mean burst (B, Xb) and interburst (C, Tjb) durations and open probabilities 
(D, Po) of the six constructs in (A). Red horizontal lines highlight the respective 
control values of the background construct. All data are shown as mean ± 
SEM (n = 4-28). 

(E) Bronsted plot for position 348. Solid line is a linear regression fit with 
slope 4) indicated. 

See also Figures S1 and S2. 



ATP-independent: the time constant of macroscopic current 
relaxation following ATP removal (1,342 ± 72 ms; Figures S2A 
and S2H), a measure of Tb in zero ATP, was similar to steady-state 
Tb of single channels in saturating (10 mM) ATP (1 ,526 ± 301 ms; 
Figures 2A and 2B). Thus, ATP-dependence of Pq reflects ATP- 
dependence of its mean interburst duration (jib). Importantly, in 
saturating (10 mM) ATP Tjb is minimal, and bursting of this back- 
ground construct is reduced to a simple equilibrium (Ci Oi ; Fig- 
ure 1 B, red box; see histograms of burst and interburst durations in 
Figure S3) suitable for study by the REFER approach. 
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Figure 5. Timing of Motion in the Narrow Region of the Pore Studied 
by Anion Replacement 

(A) Pairs of segments of inward single-channel current from three patches 
containing single cut-AR(D1370N) CFTR channels. Each patch was alternately 
exposed to bath solutions containing 140 mM of either chloride (upper seg- 
ments) or a test anion (lower segments), as indicated to the right: chloride (Cl“), 
bromide (Br“), nitrate (Nt“), formate (Fm“). Currents in Cl“, Br“, and Nt“ were 
recorded at -80 mV, those in Fm“ at -100 mV; dashes on the left mark zero- 
current level. 

(B-D) Mean burst (B, Xb) and interburst (C, Xib) durations and open probabilities 
(D, Po) in the presence of various test anions, each normalized to the value 
observed in chloride in the same patch. Red horizontal lines highlight 
the respective control values in chloride. All data are shown as mean ± SEM 
(n = 5-9). 

(E) Bronsted plot for the narrow region of the pore, constructed from normal- 
ized opening rates and equilibrium constants in the presence of the four 
permeating anions tested. Solid line is a linear regression fit with slope 
indicated. 

See also Figure S2. 



Timing of Movements in Composite Site 2 of the NBD1- 
NBD2 Interface 

NBD2 Walker-A threonine 1246 makes important contributions 
to forming composite site 2 of the CFTR NBD1-NBD2 dimer, 
by contacting the y-phosphate of ATP (PDB: 3GD7). Moreover, 
this interfacial residue undergoes relative movement upon NBD 
dimerization, as reported by interaction of its side chain across 
the dimer interface with that of opposing NBD1 residue R555 
in open, but not closed, channels (Vergani et al., 2005). To test 
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tinning of relative movement at this NBD interface position, we 
created a series of mutants by replacing the native threonine 
with valine, proline, cysteine, asparagine, and alanine, respec- 
tively, and characterized their gating kinetics in inside-out sin- 
gle-channel patches superfused with 10 mM ATP (Figure 2A). 
All of these perturbations dramatically reduced Pq (Figures 2A 
and 2D) by prolonging mean interburst duration (jibi Figures 2A 
and 2C), i.e., by slowing channel opening rate (kco = l/Tb)- In 
comparison, mean open burst durations (jb), and hence closing 
rates (1/Tb), were less affected (Figures 2A and 2B). Correspond- 
ingly, the Bronsted plot for position 1 246 yielded a steep slope of 
<I> = 0.97 ± 0.19 (Figure 2E), indicating that this position moves 
very early during the pore opening conformational transition. 
Importantly, although mutations at position 1246 slightly reduce 
the affinity for ATP binding (Vergani et al., 2005), 1 0 mM ATP re- 
mained saturating for each of the mutants (Figure SI C, red bars), 
confirming that their reduced opening rates indeed reflect slow- 
ing of step Ci ^ Oi (kco, Figure IB). 

Timing of Movements at the NBD2-TMD Interface 

The four coupling helices at the NBD-TMD transmission inter- 
face undergo large movements between inward- and outward- 
facing conformations (Dawson and Locher, 2006; Hohl et al., 

2012) . Due to the domain swapping observed in ABC exporters, 
CH2 of TMD1 (residues 270-274) is in contact with NBD2 (Fie 
et al., 2008), and tyrosine 275 at the C-terminal end of CH2 is 
part of a conserved aromatic cluster important for NBD2-TMD 
interactions (Mornon et al., 2008). To test timing of motions at 
the NBD2-TMD transmission interface, we substituted phenylal- 
anine, glutamate, lysine, leucine, and serine, respectively, for 
tyrosine 275 and studied gating of single mutant channels in 
10 mM ATP (Figure 3A). Perturbations at position 275 caused 
modest changes in Pq but in both directions (Figures 3A and 
3D). Kinetic analysis revealed a clear tendency for opposing ef- 
fects on channel closing and opening rates, both contributing 
about equally to changes in Pq: lengthened Tb was mostly asso- 
ciated with shortened Tjb and (in Y275L) shortened Tb with length- 
ened Tib (Figures 3B and 3C). Changes in opening rate (1/Tjb) 
again reflected changes in rate kco (Figure IB), since 10 mM 
ATP remained saturating for each mutant (Figure SIC, violet 
bars). These coupled changes in opening and closing rates re- 
sulted in a Bronsted plot with an intermediate slope of <I> = 
0.50 ± 0.13 (Figure 3E), indicating that position 275 has not yet 
reached its final open-like position in the opening transition state. 

Timing of Movements in the Pore Region 

Mutations of several pore residues were reported to affect gating 
(Zhang et al., 2000; Beck et al., 2008; Bai et al., 201 0; Gao et al., 

2013) , indicating gating-related movements at these pore posi- 
tions. To study the timing of such movements, we chose position 
348 in transmembrane helix 6, because mutations here pro- 
foundly affected Pq without major effects on conductance (Bai 
et al., 2010), making it an attractive target for single-channel 
gating studies. To perturb position 348, we systematically re- 
placed the native methionine with isoleucine, lysine, cysteine, 
asparagine, or alanine and recorded single-channel currents of 
the mutants in 1 0 mM ATP (Figure 4A), a saturating concentration 
for all constructs (Figure SIC, blue bars). Except for the lysine 



substitution, perturbations at position 348 all dramatically 
reduced Pq (Figures 4A and 4D), and this effect was in every 
case due to speeding of closing rate (reduction in Tb, Figure 1 B), 
with little change in opening rate (1/Tjb, cf.. Figure 4C). Interest- 
ingly, the M348K mutation only marginally affected gating (Fig- 
ures 4A-4D) but increased the affinity for pore block by ATP, 
as reported by pronounced flickery block of single-channel cur- 
rents in 1 0 mM ATP (Figure 4A), a bell-shaped ATP dose-depen- 
dence of macroscopic currents (Figure SIC, second blue bar), 
and a current overshoot upon ATP removal from macroscopic 
patches reflecting rapid unblock (Figure S2F). Of note, even for 
M348K, the macroscopic current relaxation time constant 
following ATP removal (i.e., Tb in zero ATP; Figure S2F) remained 
comparable to steady-state Tb: thus, even pronounced flickery 
block of M348K by 10 mM ATP does not delay pore closure, 
consistent with earlier demonstration that the gate, located on 
the extracellular side, can readily close while large organic anion 
blockers remain bound in the intracellular vestibule (Csanady 
and Torocsik, 201 4). We also replaced the methionine with gluta- 
mate, but this M348E mutant could not be studied at a single- 
channel level due to the presence of subconductance states; 
however, the rate of macroscopic current relaxation upon ATP 
removal attested to an acceleration of M348E closing rate com- 
parable to that of the I, C, N, and A mutants (Figures S2G and 
S2H). This speeding of non-hydrolytic closure (step Oi ^ C-i 
in Figure 1 B, rate koc) by perturbations at position 348, with little 
effect on opening rate, led to a Bronsted plot with a small slope of 
^ = 0.20 ± 0.12 (Figure 4E), indicating that this pore region still 
resembles its closed-state conformation in the opening transi- 
tion state. 

Timing of Movements in the Narrow Region of the Pore 
Studied by Anion Substitution 

Previous accessibility studies outlined a short narrow region of 
the pore, confined to approximately one helical turn of pore- 
forming transmembrane helices 1 (residues 102-106), 6 (resi- 
dues 337-341), 11 (residues 1115-1118), and 12 (residues 
1130-1136) (Beck et al., 2008; Fatehi and Linsdell, 2009; El 
Hiani and Linsdell, 2010; Bai et al., 2010; Qian et al., 2011; 
Wang et al., 2011; Gao et al., 2013; Wang et al., 2014b). This 
narrow region was shown to act as a lyotropic “selectivity filter” 
that provides sites of interaction (T338, S341 , S1 1 18, T1 134) for 
permeating anions (McDonough et al., 1994; Linsdell et al., 
2000; Linsdell, 2001; Zhang et al., 2000; McCarty and Zhang, 
2001). Intriguingly, replacement of chloride with nitrate affects 
CFTR gating (Yeh et al., 2015), suggesting that interactions of 
permeating anions with residues lining the “filter” region of 
the open pore energetically contribute to open-state stability. 
Thus, replacement of chloride with other permeant anions 
might be viewed as a structural perturbation of the “selectivity 
filter.” We therefore studied changes in the pattern of single- 
channel gating of our background construct cut-AR(D1370N) 
in response to sudden replacement of cytosolic chloride with 
nitrate, bromide, or formate. Of note, in these experiments 
gating in chloride and in the replacement anion could be 
compared within the same patch (Figure 5A): this arrangement 
eliminates any uncertainties about perturbation-induced frac- 
tional changes in opening rate, precise estimation of which is 
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Figure 6. Opening Conformational Wave 
and Transition-State Structure Reported 
by ^-Value Analysis 

(A) Merged normalized Bronsted plot for the pore 
region (blue). Fitting the ensemble of the normal- 
ized data for position 348 (standing triangles) and 
for the anion substitution experiments (inverted 
triangles) by linear regression (solid blue line) 
yielded the indicated slope value (^). Bronsted 
plots for positions 1246 (red) and 275 (violet) are 
normalized versions of the plots in Figures 2E and 
3E, respectively. The point representing the cut- 
AR(D1370N) background construct in chloride is 
highlighted by a black circle. 

(B) Ribbon representation of CFTR homology models (Corradi et al., 2015) in the closed (left) and open (right) states based on (left) the inward-facing structure of 
TM287-288 (Hohl et al., 201 2) and (right) the outward-facing structure of Savl 866 (Dawson and Locher, 2006) and cartoon depicting rough domain organization in 
the opening transition state (center). The three target positions are highlighted in spacefill on the models and as colored circles in the cartoon. CFTR domain color 
coding follows that of Figure 1 ; threonine 1 246 (red), tyrosine 275 (violet), methionine 348 (blue). Blue ribbons in the homology models highlight segments 1 02-1 06 
(TM1), 337-341 (TM6), 1115-1118 (TM1 1 ), and 1 1 30-1 1 34 (TM1 2), that form the narrow region of the pore (blue ovals in cartoon). Vertical colored arrow illustrates 
the direction and timing of the conformational wave during pore opening (early, red; late, blue). 
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normally dependent on correct judgement of the number of 
active channels in each patch. 

In addition to documented reductions in unitary conductance 
(Linsdell, 2001), perturbations of the filter by replacement of 
permeating chloride with nitrate, bromide, or formate all affected 
gating: nitrate and bromide that bind more tightly in the pore 
(Linsdell, 2001 ) increased Pq, while formate that binds less tightly 
(Linsdell, 2001) decreased it (Figures 5A and 5D). Importantly, 
both gating effects primarily reflected changes in Tb (Figure 5B), 
i.e., in rate koc of step Oi ^ Ci (Figure 1 B), with smaller changes 
in Tib (Figure 5C); the observations on nitrate replicated those of 
(Yeh et al., 2015). The slope of the Bronsted plot constructed 
from these data yielded <I> = 0.28 ± 0.03 (Figure 5E), similar to 
that of position 348. These ionic replacement studies therefore 
provide independent support for a small ^ value in the pore, con- 
firming late movement of this region during opening. 

DISCUSSION 

The general structural orientations of protein domains in the sta- 
ble closed and open states of the CFTR channel have been 
delineated by a large body of previous work. Thus, the prepon- 
derance of evidence has established that the channel’s closed 
state corresponds to a dissociated NBD dimer interface (Ver- 
gani et al., 2005; Mense et al., 2006; Chaves and Gadsby, 
2015) and inward-facing TMDs (Bai et al., 2011; Cui et al., 
2014; Wang et al., 2014a), with the closed gate located on the 
extracellular side of the membrane (Bai et al., 2011; Cui et al., 
2014; Csanady and Tordcsik, 2014; Gao and Flwang, 2015). 
Similarly, evidence suggests that in the open state the NBDs 
are dimerized (Vergani et al., 2005; Mense et al., 2006; Chaves 
and Gadsby, 2015), while a conducting pore is formed by out- 
ward-facing TMDs (Bai et al., 2011; Cui et al., 2014; Wang 
et al., 2014a). Consequently, several homology models of 
closed- and open-state CFTR have been constructed based 
on crystal structures of homologous ABC exporters in their in- 
ward- and outward-facing conformations (Mornon et al., 2009; 
Corradi et al., 2015) (Figure 6B, left and right). In contrast, far 
less is known about the nature and relative timing of the sub- 



microsecond molecular motions that drive the channel from 
its closed- to its open-state conformation. 

Here, we have adapted the REFER technique to obtain new 
insight into the dynamics of ATP-dependent gating conforma- 
tional changes of the CFTR protein: careful choice of the back- 
ground construct (see below) allowed selective examination of 
the channel opening process (step Ci ^ Oi, Figure IB). The 
strikingly different $ values obtained for our three target posi- 
tions define a clear spatial gradient along the protein’s longitudi- 
nal axis from cytoplasm to cell exterior: the very high ^ value 
of ~0.97 for site-2 NBD interface position 1246 (Figures 2E and 
Figure 6A, red) stands in stark contrast to the low value of 
~0.20 for intra-pore position 348. For the pore region a similarly 
small ^ value emerges also from our anion substitution studies: 
replacement of permeating chloride with anions such as nitrate 
and bromide, which bind more tightly to the pore (as indicated 
by permeability ratio measurement) (Linsdell, 2001), clearly 
stabilize the open state, whereas formate, which binds less 
tightly than chloride (Linsdell, 2001), destabilizes it (Figure 5D). 
Although the precise location at which permeating anions act 
to affect CFTR gating is unknown, this strong positive correlation 
between anion binding affinity in the filter and open-state stability 
does support the notion that the gating effects are caused by in- 
teractions of the anions with residues located somewhere in the 
pore. It is notable that the effects of ionic replacement on open- 
closed equilibrium were in each case associated with changes in 
closing, rather than opening rates (Figures 5B and 5C), implying 
that the stability of the transition state, relative to the closed 
state, is less sensitive to the permeating anion species. Insofar 
as pore-anion interactions are expected to change between 
the closed and open state, the implication is that in the transition 
state these interactions resemble those in the closed state: i.e., 
the pore is closed. The Bronsted plots for ionic replacement 
and for the 348 position closely agree with each other, and the 
combined data are well fitted by a single line with a slope of 
0.23 ± 0.05 (Figure 6A, blue). Compared to the values for the 
NBD1-NBD2 interface and the pore, which are close to the 
highest and smallest possible values for this parameter, respec- 
tively, the slope of ~0.50 of the Bronsted plot for NBD-TMD 
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interface position 275 (Figure 6A, violet) appears intermediate, 
distinctly different from the two extremes. This spatially orga- 
nized ^ value gradient provides support for the interpretation 
that for conformational changes of folded proteins the relative 
magnitude of <I> reflects relative timing of ordered sequential 
movements (Zhou et al., 2005; Auerbach, 2007), albeit on the 
sub-microsecond timescale, as opposed to probabilities of 
taking alternative kinetic pathways known to exist for more 
random processes such as protein folding (Purohit et al., 
2013). For the CFTR pore opening transition, this spatial <J>- 
gradient implies a conformational wave (Figure 6B, large verti- 
cal arrow) initiated by tightening of the NBD dimer around site 2 
and propagated with some delay through the NBD-TMD inter- 
face to eventually result in pore opening. 

Furthermore, this set of ^ values provides strong global con- 
straints for the structure of the actual transition state, the highest 
free-energy intermediate of the channel opening process (Fig- 
ure 6B, center). For the NBD interface, the $ value of ~1 indi- 
cates that it has already reached its open-state conformation, 
i.e., the tight dimer is already formed (Vergani et al., 2005). In 
contrast, the low <I> value of ~0.2 for the pore region implies it 
is still in its closed-like, inward-facing conformation. Finally, the 
intermediate ^ value of ~0.5 for position 275 suggests that in 
the transition state coupling helix 2 is just on the move: it has 
already left its closed-like conformation but has not yet reached 
its open-like conformation (Figure 6B, center, bent rods). This 
transition state architecture, which emerges from direct mea- 
surements of relative timing of movements, confirms a previous 
speculation based on interpretation of enthalpy and entropy 
changes determined for the opening transition state (Csanady 
et al., 2006) but refutes the alternative proposition that dur- 
ing opening all structural reorganizations in the cytoplasmic 
loops are completed in the channel closed state (Aleksandrov 
et al., 2009). The large molecular strain that must arise at the 
NBD-TMD interface is the most likely cause of the very high 
enthalpy of the opening transition state (AH^ = 117 kJ/mol) and 
is only partially compensated by an entropy increase (TAS^ > 
41 kJ/mol) suggested to reflect dehydration of the closing NBD 
dimer interface (i.e., dispersal of the layer of ordered water mol- 
ecules into the disordered bulk solution) (Csanady et al., 2006). 
Evidently, transition-state free energy (AG^ = AH^ - TAS^) of 
wild-type CFTR is still very high, as witnessed by its very slow 
opening rate of ~1 s“\ > 4 orders of magnitude slower than 
for the ligand-gated nicotinic acetylcholine receptor (Jackson 
et al., 1983). Moreover, it is this transient conformation of 
CFTR that is further destabilized (relative to the closed state) 
by NBD-TMD interface mutation AF508, causing the severe 
gating defect of this most common CF-associated mutant. 
Indeed, stabilization of the opening transition state seems an 
attractive strategy for designing potentiator compounds that 
stimulate gating of AF508 CFTR: thus, 5-nitro-2-(3-phenylpropy- 
lamino)benzoate, one of its most efficacious potentiators (albeit 
with a pore blocking side effect) (Wang et al., 2005), increases 
AF508 CFTR opening rate by precisely that mechanism (Csa- 
nady and Torocsik, 2014). 

Successful adaptation of the classical REFER approach 
to studying CFTR gating dynamics rested on three important 
innovations. 



First, rather than focusing on the kinetics of pore opening and 
closure (Scott-Ward et al., 2007), the durations of bursts of open- 
ings and of long interburst closures were analyzed here, as the 
latter reflect conformational states of the pore associated with 
specific conformations of the NBDs: bursts are linked to tightly 
dimerized NBDs, while interburst closures reflect dissociation 
of the NBD interface around site 2 (Vergani et al., 2005; Chaves 
and Gadsby, 2015). The duration of the “active” pore conforma- 
tion induced by a single ATP occlusion event at site 2 is also re- 
flected by the time constant of macroscopic current relaxation 
upon sudden removal of ATP. Indeed, for all of our constructs 
that afforded macroscopic recordings, such macroscopic cur- 
rent relaxation time constants (Figure S2) were in good agree- 
ment with the mean burst durations obtained by conventional 
burst analysis of steady-state single-channel recordings (Figures 
2B, 3B, 4B, and 5B), confirming that Tb indeed reflects the dura- 
tion of an activated state of the pore induced by a single ATP oc- 
clusion event. 

Second, CFTR bursting follows a non-equilibrium cycle 
(Gunderson and Kopito, 1995; Csanady et al., 2010) (Figure 1 B) 
to which REFER analysis is not applicable (Csanady, 2009). 
To study the pore opening step, we therefore employed the 
D1370N background mutation that truncates the gating cycle 
to an equilibrium scheme (Figure IB, red frame). Indeed, this is 
the key feature that distinguishes our approach from previous 
studies and is responsible for its very different outcome. This is 
because in the normal hydrolytic background, mutation-induced 
changes in the rate of slow non-hydrolytic closure (rate koc, 
Figure 1 B) remain unnoticed as long as the much faster hydro- 
lytic pathway (rate Oi ^ O 2 , Figure 1 B) dominates pore closure. 
It is therefore not surprising that structural perturbations intro- 
duced into the nucleotide binding sites and several TMD/NBD 
interface positions of WT CFTR affected only channel opening 
rates, yielding apparent values of ~1 for all positions tested 
(Aleksandrov et al., 2009). Similarly, previous studies identified 
several pore mutations that affected gating (Beck et al., 2008; 
Bai et al., 201 0), but in the framework of a hydrolytic gating cycle 
even the large, almost an order of magnitude, acceleration of 
non-hydrolytic closing rates reported here for mutations at posi- 
tion 348 has so far evaded detection. Of note, the AF508 muta- 
tion also greatly accelerates non-hydrolytic closure (Jih et al., 
201 1)— suggesting an intermediate ^ value for position 508— 
yet under normal hydrolytic conditions AF508 closing rate is un- 
affected (Miki et al., 2010; Kopeikin et al., 2014). 

Third, removal of the R domain eliminated potential confound- 
ing effects of altered R-domain-mediated gating regulation in our 
target-site mutants: not only does the non-phosphorylated R 
domain inhibit channel gating, but the phosphorylated R domain 
also mediates substantial stimulation of channel Pq (Winter and 
Welsh, 1997; Csanady et al., 2000), through mechanisms that 
are poorly understood. In that regard, our cut-AR background 
construct, pared down to the canonical ABC domains, reduces 
complexity: in addition to obviating the need for prior phosphor- 
ylation by PKA, gating of cut-AR CFTR is regulated only by 
ATP, similarly to the transport cycle of ABC exporters. Thus, 
our <I>-value map likely bears relevance to the transition state 
for the inward- to outward-facing transition in this broader family 
of CFTR relatives. Because gating of cut-AR(D1370N), like that 
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of WT CFTR, is strictly ATP-dependent (Figures S1 and S2), our 
conclusions do not necessarily apply to the mechanism of the 
extremely infrequent spontaneous openings observable in the 
absence of ATP that are promoted by certain mutations (Wang 
et al., 2010) and drugs (Jih and Flwang, 2013). 

In conclusion, we have provided an initial characterization 
of the CFTR opening transition-state structure that could serve 
as a drug target for treating CF and developed a technique to 
directly measure timing of movements in distinct regions of the 
CFTR protein during the sub-microsecond process of channel 
opening. By further refining the <I> value map, this approach might 
be used in the future to define regions that move as a rigid body 
(Purohit et al., 2013, 2015), or to shed light on potentially asyn- 
chronous movements at the level of the ATP binding sites, the 
coupling helices, or the TM helices that form the pore. 

EXPERIMENTAL PROCEDURES 

pGEMHE-CFTR(837-1480(D1370N)) was constructed from pGEMHE- 
CFTR(837-1480), mutations at positions 275 and 348 were introduced into 
pGEMHE-CFTR(1-633) (Csanady et al., 2000), and mutations at position 1246 
into pGEMHE-CFTR(837-1480(D1370N)) using Stratagene QuickChange. 
cDNA was transcribed in vitro using T7 polymerase, and 0.1-10 ng cRNA for 
both CFTR segments were co-injected into Xenopus laevis oocytes extracted 
from anaesthesized frogs following Institutional Animal Care Committee guide- 
lines. Currents were recorded at 25°C in inside-out patches excised from 
oocytes 1-5 days after RNA injection. Pipette solution contained (in mM) 136 
NMDG-CI, 2 MgCl 2 , 5 HEPES, pH = 7.4 with NMDG. The continuously flowing 
bath solution could be exchanged with a time constant <100 ms. Standard 
(chloride-based) bath solution contained 134 NMDG-CI, 2 MgCb, 5 HEPES, 
0.5 EGTA, pH = 7.1 with NMDG. For anion substitution experiments NMDG-CI 
and MgCl 2 were replaced by NMDG and Mg(OH) 2 , and the solution titrated to 
pH = 7.1 with nitric, hydrobromic, or formic acid, respectively. MgATP (Sigma) 
was added from a 400-mM aqueous stock solution (pH = 7.1 with NMDG). Uni- 
tary CFTR currents in 10 mM MgATP were recorded at -80 mV (-100 mV for 
formate currents) (EPC7, Heka Elektronik) at a bandwidth of 2 kHz and digitized 
at 1 0 kHz. Single-channel patches were identified as very long (typically 1 5 min- 
1 hr) recordings without superimposed channel openings. For T1246 mutants 
strong stimulation by 2'-deoxy-ATP at the end of each experiment was used to 
facilitate correct estimation of the number of active channels in the patch (Fig- 
ure S4). To reconstruct bursts and interbursts, currents from patches containing 
no superimposed channel openings were refiltered at 20 Hz (1 0 Hz for anion sub- 
stitution experiments), idealized by half-amplitude threshold crossing, and brief 
closures suppressed (Figure S3) using the method of Magleby and Pallotta 
(1983). Opening (kco) and closing (koc) rates were defined as 1/xib and 1/xb, 
respectively, and Keq as kco/koc- All data are given as mean ± SEM of measure- 
ments from at least 4 (typically 5-8) long segments of single-channel recordings, 
from 4-1 3 patches for each mutant; in the face of alternating periods of lower and 
higher activity typical to CFTR (Bompadre et al., 2005), several hours of total 
recording for each construct were obtained to ensure unbiased sampling of 
average gating behavior. Macroscopic current ratios between 3 and 10 mM 
ATP were used to verify saturation by 10 mM ATP (Figure SI). Time constants 
of macroscopic current relaxations upon ATP removal were obtained from 
single-exponential fits (Figure S2). 
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SUMMARY 

The mechanisms by which intrinsically disordered 
proteins engage in rapid and highly selective binding 
is a subject of considerable interest and represents a 
central paradigm to nuclear pore complex (NPC) func- 
tion, where nuclear transport receptors (NTRs) move 
through the NPC by binding disordered phenylala- 
nine-glycine-rich nucleoporins (FG-Nups). Combining 
single-molecule fluorescence, molecular simulations, 
and nuclear magnetic resonance, we show that a 
rapidly fluctuating FG-Nup populates an ensemble 
of conformations that are prone to bind NTRs 
with near diffusion-limited on rates, as shown by 
stopped-flow kinetic measurements. This is achieved 
using multiple, minimalistic, low-affinity binding mo- 
tifs that are in rapid exchange when engaging with 
the NTR, allowing the FG-Nup to maintain an unex- 
pectedly high plasticity in its bound state. We propose 
that these exceptional physical characteristics enable 
a rapid and specific transport mechanism in the phys- 
iological context, a notion supported by single mole- 
cule in-cell assays on intact NPCs. 

INTRODUCTION 

The plasticity of intrinsically disordered proteins (IDPs) is thought 
to be key to their highly diverse roles in the eukaryotic interactome 
and a variety of vital processes such as transcription, epigenetic 
regulation mechanisms, and transport through nuclear pore com- 
plexes (NPCs) (Dyson and Wright, 2005; Tompa and Fuxreiter, 
2008). The central channel of the NPC is filled with phenylala- 
nine-glycine-rich proteins, called FG-nucleoporins (FG-Nups) 



that are intrinsically disordered (Denning et al., 2003). FG-Nups 
build up an approximately 30-nm-thick permeability barrier 
through which large molecules (>40 kDa) can only be shuttled 
when bound to a nuclear transport receptor (NTR) with passage 
times as fast as 5 ms (Hoelz et al., 2011; Kubitscheck et al., 
2005; Tu et al., 2013; Walde and Kehlenbach, 2010). Due to the 
intrinsic dynamics of the FG-Nups, even state-of-the-art electron 
tomographic studies are not able to visualize them within the cen- 
tral NPC channel, despite their millimolar concentrations (Bui 
et al., 2013). Consequently, the molecular structure of the perme- 
ability barrier and its general mode of action are widely debated 
(for a review see Adams and Wente, 2013). 

The key to understanding the observed nucleocytoplasmic 
transport phenomena resides in a description of the binding 
mode between FG-Nups and NTRs, for which a molecular anal- 
ysis of the FG-Nup • NTR interaction is a prerequisite. Our current 
understanding of the molecular basis of FG-Nup* NTR interac- 
tions is in large part derived from X-ray crystallographic struc- 
tures or molecular dynamics (MD) simulations of NTRs in the 
presence of short FG-peptides (up to ~1 3 amino acids in length) 
(Bayliss et al., 2000; Isgro and Schulten, 2005), as well as binding 
measurements with different NTRs or mutated NTR binding 
pockets (Bednenko et al., 2003; Milles and Lemke, 2014; Otsuka 
et al., 2008). Even for FG-Nups alone, only overall chain dimen- 
sions or long-range interactions within the Nups have so far been 
analyzed in solution (Milles and Lemke, 2011; Yamada et al., 
2010). Notably, even such fundamental binding characteristics 
as the equilibrium dissociation constant (KJ between Nups 
and NTRs are still matter of discussion - estimates range from 
a few nM to several mM (Bednenko et al., 2003; Ben-Efraim 
and Gerace, 2001; Tetenbaum-Novatt et al., 2012; Tu et al., 
2013). However, high Kd (low affinity, ~mM) values are not easily 
compatible with high specificity of the transport process, while 
low Kd values (~nM range) cannot easily explain high transport 
rates, since these might be expected to correlate with long 
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Figure 1. Conformation of Nup153FG'^^'^® 

(A) Scheme of Nup153FG constructs. 

(B) Residual dipolar couplings (RDCs) of NupISSFG'^'''"® aligned in phages. 
Experimentally obtained RDCs (gray bars) were compared with RDCs calcu- 
lated from the ASTEROIDS ensemble obtained on the basis of experimental 
chemical shifts (red line). Dashed lines represent positions of FG-repeats and 
F1374. Color code as in (A). 

(C) The same conformational ensemble was used to calculate a small angle 
X-ray scattering (SAXS) curve using CRYSOL (red line). The back calculated 
scattering curve is in good agreement with measured SAXS data under similar 
experimental conditions (black dots) (Mercadante et al., 2015). 

(D) Distribution of the radius of gyration (Rq) from five equivalent ASTEROIDS 
selections. The three conformations displayed on top represent the most 
compact, the least compact, and one of the most prevalent conformations in 
the ensemble. 



residence times whereas NTRs must encounter many FG-Nups 
while crossing the thick barrier. 

Fast protein binding also typically requires proper orientation 
of the protein binding partners as well as conformational adap- 



tion of the IDP to bind to a folded protein. Those can occur prior 
to or during binding, as described by either of the two prevalent 
models for protein binding namely conformational selection and 
induced fit (Csermely et al., 2010; Wright and Dyson, 2009). 
While such a conformational shift or fit can present the rate- 
limiting step of binding, fast binding is warranted in many biolog- 
ical processes. Several binding rate enhancing effects have 
been suggested or observed experimentally, such as mainte- 
nance of a degree of disorder (termed “fuzziness”; Tompa and 
Fuxreiter, 2008) by conformational tunneling (Schneider et al., 
2015), a large capture radius of the flexible IDPs (Shoemaker 
et al., 2000), and the involvement of long-range electrostatic 
interactions to steer (attract) proteins together (Ganguly et al., 
2013). 

In this work, we characterize the conformational plasticity of 
Nups from human and yeast in the presence of structurally 
and functionally diverse NTRs. A focus was a PxFG-rich 
domain of the Nup153 (Nup153FG^'''^^) as its size permitted 
a combination of nuclear magnetic resonance (NMR), single 
molecule Forster resonance energy transfer (smFRET), and 
molecular dynamics (MD) simulations to characterize local, 
residue specific, as well as long-range implications of Impor- 
tinp binding to Nup153FG'^^'^^ conformation and dynamics. 
Additional Brownian dynamics (BD), fluorescence stopped- 
flow and single molecule transport experiments with functional 
NPCs in permeabilized cells, revealed the detailed kinetics 
of the complex formation between Nup and NTR. Using this 
molecular, integrative structural biology approach we propose 
a mechanism whereby Nups contribute low-affinity minimalis- 
tic binding motifs that act in concert to create a polyvalent 
complex. The global Nup structure and dynamics are largely 
unaffected by the interaction, thereby ensuring ultrafast bind- 
ing and unbinding of individual motifs— a result that explains 
how nuclear transport can be fast yet specific, and that may 
have general implications for the mechanism of action of other 
IDPs that exhibit a multiplicity of binding motifs. 

RESULTS 

NupISSFG'*^'"^ Populates a Disordered Ensemble in 
Solution 

We initially characterized the structure and dynamics of 
Nup153FG^^'^^ using high resolution NMR (Figure 1A, se- 
quences given in Supplemental Experimental Procedure). 
Complete assignment of the backbone resonances (Fig- 
ure S1) allowed us to develop a multi-conformational model 
of the protein in solution using a combination of Flexible- 
Meccano (Ozenne et al., 2012) and the genetic algorithm 
ASTEROIDS (Jensen et al., 2010). Representative ensembles 
comprising 200 conformers were selected on the basis of 
the experimental chemical shifts and were in excellent agree- 
ment with ^Dn-nh and ^Dca-Ha residual dipolar couplings and 
small angle X-ray scattering (SAXS) curves (Mercadante 
et al., 2015) that were not used in the selection process 
(Figures 1B-1D). The amino acid specific backbone dihedral 
angle distributions determined from the ensemble selections 
(Figure S1) show that negligible secondary structure is 
present. 
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Figure 2. Nup153FG^^'^°*lmportinp Interaction Analyzed by smFRET 

(A) FRET efficiency (Efret) versus fiuorescence lifetime (t) histograms of 
Nup153FG^^'^® in the presence and absence of Importinp. The dotted line 
visualizes the center position of the FRET peak. The dashed (diagonal) lines 
show the static Efret relationship, on which a distribution would lie in the 
absence of fast dynamics. 

(B) Fluorescence lifetimes (t) of the double labeled population accumulated 
from single molecule data in the absence (black) and presence (green) of Im- 
portinp. Offset from a single exponential lifetime (dashed gray curve and arrow) 
is a strong indicator of protein dynamics. 

(C) Fluorescence correlation spectroscopy (FCS) traces retrieved from mea- 
surements of NupISSFO^^*^® (black dots) reflect a slower translational motion 
in the presence of Importinp (green dots). 



Global Structure and Dynamics of the NupISSFG'*^'^^ 

Are Retained upon Interaction with Importing as 
Measured by smFRET 

We labeled Nup1 with a donor (Alexa488) and acceptor 

dye (Alexa594) for FRET at its C- and N terminus, respectively. 
This allowed us to measure average distance between the 
dyes as well as the dynamic properties of the protein using histo- 
grams relating FRET efficiency (Efret) and donor lifetimes (t) of 
single molecules (sm), a method widely used to detect even min- 
ute changes in structure and dynamics, for example when IDPs 
bind, fold or expand (Kalinin et al., 201 0; Milles and Lemke, 201 1 ; 
Schuler and Eaton, 2008). 

We added unlabeled Importinp to the FRET labeled 
Nup153FG'^^'^^ and followed the smFRET response. While the 
diffusion of Nup1 SSEG^""*^^ in the absence and presence of 
Importinp confirmed the binding of Importinp under single 
molecule conditions (Figures 2 and S2), we detected neither 
substantial changes in Efret nor in the width of the histograms 
indicating absence of significant changes in the distance distri- 
bution (Figure S2 shows an all F to all A negative control). 
Indeed, the Efret populations of the unbound and bound 
NupISSFG^^*^^ also overlay very closely with respect to t, 
which indicates similarly fast dynamics of both forms (Figures 



2 and S2 for detailed analysis of structure and dynamics) 
(Kalinin et al., 2010). 

As smFRET is compatible with large proteins, we were able to 
repeat the same experiments for the same PxFG region within 
the full-length Nup1 53FG (601 amino acids), finding similar char- 
acteristics, and suggesting that our truncated NupISOFG^^*^^ 
largely retains the conformational sampling from within the whole 
Nup153FG (Figure S2). 

In order to determine the general nature of this binding mode, 
we repeated the experiments with two different FxFG-rich re- 
gions of Nup153FG, as well as the GLFG-rich yeast Nup49 
and several different NTRs: i) transportin 1 (TRN1), a transport 
receptor involved in the import of proteins containing an M9 
recognition sequence, ii) nuclear transport factor 2 (NTF2), 
the import receptor of RanGDP and iii) chromosomal region 
maintenance 1 (CRM1), a major exportin. While TRN1 and 
CRM1 have a similar molecular weight and superhelical struc- 
ture as Importinp, NTF2 is a much smaller, p sheet-rich dimer 
(Cook et al., 2007; Morrison et al., 2003). As detailed in Fig- 
ure S3, despite the very distinct functionalities of the different 
NTRs, the smFRET and FCS measurements of the different 
Nups and NTRs indicate similar binding characteristics as for 
the Nup153FG* Importinp complex. 

Interaction with Importinp Influences NupISSFG'*^'^^ 

Only Locally and Transiently 

To characterize the effects of Importinp binding on Nup1 SSFG^^*^*^ 
at atomic resolution, we titrated Importinp into a solution of ^^N 
labeled NupISSFC^""*^*^ and measured ''FI-‘'®N FISQC spectra at 
different molar ratios. Peak intensities, as well as and ^^N 
chemical shifts of NupISSFG^^*^*^, were analyzed for each 
titration step (Figures 3 and S4). Resonance line broadening, 
associated with small changes in both and ^^N chemical 
shifts, was observed around all F’s in the Nup sequence (Fig- 
ure 3A). Binding was clearly highly localized, and limited to F’s, 
with only F and the immediately adjacent amino acids being 
affected by the interaction. Interestingly, one single F, which 
is not associated with a G, is also involved in binding to Importinp, 
showing the largest chemical shift changes in the ^H-^^N 
HSQC spectrum during titration with Importinp (Figure 3A 
and S4). ''^N relaxation rates measured as a function of molar 
ratio of Importinp suggest that, overall, the molecule remains flex- 
ible in the complex with the transverse relaxation (R 2 ) increasing 
significantly upon Importinp titration only around the interaction 
sites (Figures 3C and S4), in agreement with the above 
smFRET-based observations that global disorder and flexibility 
are not affected by Importinp binding. Carr-Purcell-Meiboom- 
Gill (CPMG) relaxation dispersion experiments (Figure S4) sug- 
gested that fast exchange (< 10 [is) between the bound and 
unbound form of Nup153FG'^^'^^ gives rise to the increased R 2 
rates around the interaction sites, which makes it possible 
to estimate a residue-specific Kdjndividuai for each position in 
Nup153FG^'"'^^ with Importinp (Figures 3E, 3F and S4) from the 
population weighted R 2 measurements. Interestingly, the FG- 
specific affinities to Importinp are not identical across the 
Nup1 53FG^^'^‘^ sequence, implying a contribution of inter-FG res- 
idues to binding, although all FG-specific Kdjndividuai values lie in 
the millimolar range. 
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Figure 3. Nup153FG^^'^®*lmportinp Interaction by NMR Spectroscopy 

(A) HSQC spectrum of NupISSFG'^'''^® (red) overlayed with a spectrum of NupISSFG'^'''^® in the presence of importing (green, Nup to NTR moiar ratio of 
1.14, at a Nup concentration of 240 laM). 

(B) The intensity ratio of the bound and unbound form of Nup153FG'^^'^° was piotted under the same conditions as in (A). 

(C) R 2 reiaxation rates at 25° C and a frequency of 600 MHz were measured at different concentrations of importin(3 (gray bars are without importin(3; biack, 

iight green and dark green at importinp/Nup153FG^^'^® moiar ratios of 0.17, 0.33, and 0.72 at the constant NupISOFG^^'^® concentration of 250 laM). 

(D) ''^N R 2 of Nup153AG'^'''^®’ in the absence (gray) and in the presence of importinp (red) overiayed with the rates for Nup153FG'^'''^® in the presence of 
Importinp under the same conditions (green). 

(E) For aii F in the Nup153FG'^^'^® sequence, R 2 vaiues were piotted against importinp concentration and fitted with a iinear siope. The same anaiysis was 
performed for F1 374 in Nupl 53AG^^"^®’ and compared to the same F in Nup1 53FG^^'^® (compare red to green siope). R2 with errors greater than 20% were 
exciuded from the anaiysis. 

(F) Locai Kd vaiues were caicuiated from the siopes obtained in Figure S4. Gray bars correspond to Kd vaiues obtained from Nupl 53FG'^'''^®, the red bar shows the 
iocai Kd of Nup153AG^^^®’ binding to importinp. 

Error bars show SD. 



Strikingly, when studying the binding to different NTRs like 
TRN1 and NTF2 (Figure S4), despite exhibiting different binding 
preferences for FG-Nups (Cook et al., 2007; Milles and Lemke, 
2014), their binding modes are remarkably similar to that of the 
Importinp complex. The same regions in Nup153FG^^'^^ are 
affected by the interaction, again with very low residue specific 



affinities, with the Nup remaining overall flexible when bound 
while interacting only locally as seen from both chemical shift 
changes, in the case of NTF2, and remarkably similar locally 
elevated transverse relaxation rates in TRN1 (Figure S4). Com- 
parison of ^^C backbone chemical shifts measured in the free 
and NTF2-bound forms of Nup153FG^^'^^ demonstrates that 
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Figure 4. Binding of NupISSFG^’^'^® to Importinp'^ 

(A-C) Contact area between (A) NupISSFG'^'''^® and Importinp^ and (B) diffu- 
sion coefficients D as a function of time for the 4 binding events out of 10 
simuiations (gray/biack: prior to binding; different coiors: after binding; biack/ 
red curves refer to the cartoon in (C) sampied using CHARM M22* force fieid. 

(C) Snapshots coiiected aiong one of the recorded MD trajectories showing the 
binding between Nup153FG'^^'^° (red cartoon) and importinp"^ (gray surface). 
The binding sites on Importinp'^ and Nupl 53FG'^^'^® FG-repeats are coiored in 
orange and cyan respectiveiy. 

(D) Nupl 53FG'^'''^° radius of gyration (Rq) as a function of end-to-end distance 
(Re) for the unbound (biack) and bound (green) ensembies of Nup153FG'^'''^° 
obtained from the simuiations performed using CHARMM22*. 

See Figure S5 for data using the AMBER force fieid. 

the protein backbone remains flexible upon interaction, sam- 
pling effectively the same conformational equilibrium in the free 
and bound state (Figure S4). 

We note that during the publication process of this work, local- 
ized interaction was also reported for the yeast Nsp1 with Kap95 
(the yeast homolog of Importinp) using NMR (Hough et al., 201 5), 
suggesting that a similar interaction mechanism may also be 
conserved across species. 

Co-operativity of FG-Nup* Importinp Binding 

To further quantify the action of multiple FG-repeats, we de- 
signed a Nup construct, in which all F of NupISSFO^^*^^ except 
FI 374, the strongest interaction site for Importinp, were re- 
placed by A (Figure SI). Titration of Importinp into this Nu- 
pl 53 AG mutant resulted in strongly reduced peak 



broadening and negligible chemical shift changes compared to 
Nup153FG^^'"^ (Figure S4). As in the case of Nup153FG^^'"^, 
R 2 relaxation rates of Nup153AG^^^’'"^^^'^ at the interaction 
site exhibited a linear dependence on Importinp concentration 
(Figure 3E). However the effective Kd, individual frcm FI 374 within 
Nup153AG'^^'^^’'^^^^"^ reveals significantly weaker binding for 
this interaction site than for FI 374 when situated within 
the wild-type (WT) protein (Kd, individual = 7.3 mM compared to 
0.8 mM, Figure 3). This result clearly shows that presenting mul- 
tiple equivalent binding sites to the binding partner has a 
measurably positive effect on the effective affinity of the individ- 
ual interaction site. 

Monitoring the Nupl Importinp Binding Using 
All- Atom MD 

We employed MD simulations to investigate the experimental 
observations of Nupl 53FG'^^'^*^« Importinp association from 
NMR and smFRET. From a broad ensemble of Nup153FG^^'^^ 
obtained from unbiased MD simulations in explicit solvent 
(Movie SI), we incubated different conformers with the N-termi- 
nal portion of Importinp (from here named Importinp'^ (Bayliss 
et al., 2000)) and monitored their binding for a total simulation 
time of 2 [is (Figures S5 and S6, and Table SI). The association 
of Nupl SOFG^^*^^ to Importinp*^ was repeatedly observed within 
the simulated timescale and occurred in a specific manner (Fig- 
ures 4 and S5, and Movie S2). FG-repeats docked into previously 
identified binding pockets on the surface of Importinp^ and 
even formed contacts similar to those previously observed crys- 
tallographically upon interaction between Importinp and Nspl- 
derived peptides (Figures 4C and S6) (Bayliss et al., 2000). Bind- 
ing was reduced and less specific for Nupl 53FG^^^ (Figure S5), 
in agreement with NMR and smFRET (Figures SI , S2, and S4). 

We suggest that the high solvent exposure of Fs in the un- 
bound state (typically contained within the hydrophobic interior 
of folded proteins) (Figure S5) renders them readily available 
for Nupl SOFG^^*^^* Importinp*^ association, without requiring 
any global structural transitions in either partner (Figures 4D, 
S6, Movie S2). 

The ability to monitor spontaneous Nupl SOFG^^*^^* Importinp*^ 
association on the sub-microsecond timescale suggests an 
ultrafast association (Figure S5). Underlining the generality of 
our observation, we were also able to monitor such a spon- 
taneous binding event when repeating simulations for an 
FxFG-rich region of Nupl 53 binding to Importinp*^ (Figure S5, 
Movie S3, sequences given in Supplemental Experimental Proce- 
dure). However, force field inaccuracies and limited sampling 
prohibit the reliable extraction of an association rate, and we 
therefore studied the interaction further through fluorescence 
stopped-flow experiments (FSF) and Brownian dynamics (BD) 
simulations. 

FSF Experiments and BD Simulations Reveal Ultrafast 
Binding between Nup and Importinp 

Stopped-flow kinetics monitoring fluorescence anisotropy (r) 
can be used to study binding mechanisms and measure the as- 
sociation rate (kon) between proteins (Shammas et al., 201 3). The 
binding of Importinp to Nup153FG site-specifically labeled with 
Cy3B elicits detectable changes in r, due to slowed rotational 
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Figure 5. Association Kinetics for Nup153FG with Importinp 

(A) Stopped-flow fluorescence anisotropy was used to monitor the binding of Importinp (ImpP) at different concentrations to Nup153FG-Cy3B. A selection of 
anisotropy (r) traces against time is shown for Nup153FG alone (purple) and for the binding of Importinp WT (black) and Importinp^^ (red). 

(B) The observed rates (kobs.uitrafast) from association experiments were plotted against the different Importinp concentrations, the data were linearly fitted to 
obtain the association rate constants (kon,uitrafast)- 

(C) Apparent Kd,app values under the different experimental conditions. 

(D) kon obtained from association experiments of Nup153FG and Importinp at different ionic strengths fitted with a Debye-Huckel-like approximation to calculate 
the basal rate constant at infinite ionic strength. 

(E) Summary of the kon values obtained from BD (dark bars) and FSF measurements (light bars) (Table S2D). 

Error bars show SD. 



motion (Milles and Lemke, 201 4). Since Nup1 has only 

a very small overall binding affinity toward Importinp, we could 
not detect a sufficiently strong signal change in the anisotropy 
measurements in the tested and experimentally feasible concen- 
tration range (Figure S7). Thus, for FSF, we used fluorescently 
labeled full-length Nup153FG. We performed rapid mixing ex- 
periments under pseudo-first order conditions in “physiological” 
transport buffer. A monoexponential function does not describe 
well the observed anisotropy changes in Figure 5 (Figure S7 and 
Table S2). This is likely a result of having multiple different bind- 
ing motifs and/or the ability of multiple Importinp to engage into 
binding a single Nup, which adds another level of complexity 
(multivalency) (Milles and Lemke, 2014; Schoch et al., 2012; 
Wagner et al., 201 5). A biexponential equation is able to describe 
the kinetics, resulting in two kobs pei' Importinp concentration 
(Figures 5A, 5B, and S7). The fluorescence anisotropy at the 
end of the reaction was used to calculate the apparent Kd,app 
(Figure 5C). Remarkably, by performing experiments at multiple 
NTR concentrations we extracted an ultrafast kon.uitrafast = 
1.5-10® M“^s“^ (Figure 5B) for the major component (average 
amplitude of 70%), while the second component was still very 



fast, with a kon, fast = 6.1 • 1 0^ M“^s“'' at room temperature. These 
FSF measurements report on overall formation of the 
Nup1 53FG* Importinp complex i.e., one or more F binding. While 
we provide all results and further analysis details in Figure S7 and 
Table S2, for later discussion we focus on the fastest measured 

l^on, ultrafast- 

We next estimated association rates from BD simulations, 
which compared to MD permit larger statistical sampling, at 
the cost of freezing the internal dynamics of the binding partners. 
Upon successful complex formation, starting from the confor- 
mations obtained from MD, the association rate was estimated 
(Figure S7) to be around 10® M“^ s“^ (Figure 5E), in agreement 
with stopped-flow measurements. 

BD simulations carried out without the contribution of apolar 
desolvation generated a drastic decrease of the estimated kon.Bo 
by around two orders of magnitude, while the absence of elec- 
trostatic interactions had a negligible effect (Figures 5E and 
S7, and Table S2D and S2E). These observations complement 
our evidence for an association mainly favored by the energetic 
gain of sequestrating F residues from the solvent and burying 
them into the Importinp*^ binding pockets. 
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Figure 6. Nuclear Transport Assays of Importing and Importlnp^^ 

(A and B) DAPI staining shown in biue, and green fluorescent cargo (NLS- 
MBP-eGFP) in permeabilized HeLa cells incubated with either Importinp (A) or 
Importinp^'^ (B) (scale bar 50 |am). After 45 min, cargo accumulation is higher in 
the nucleus in (A). 

(C) Single molecule trajectories of fluorescently labeled Importinp were acquired 
in the equatorial plane of the nucleus exploiting an inclined (Hilo) illumination. 

(D) Representative image of acquired single molecule trajectories of Importinp- 
Alexa488 (red lines) overlaid with the ensemble image of lmportinp-Alexa647 (in 
green, scale bar 1 lam) used to identify the nuclear envelope position (blue line). 
Single particle tracks of the fluorescently labeled NTR (cyan lines) crossing the 
nuclear envelope were analyzed to yield the characteristic barrier crossing time. 

(E) The crossing time distributions reported for Importinp (blue bars) and Im- 
port! np"^"^ (red bars) are very fast. 



(Figures 5D and S7 and Table S2B) (Shammas et al., 2014). 
In line with the BD simulations, we obtained a kon, elect off of 
2.9-10® M“^s“\ i.e., binding remains very fast even under elec- 
trostatic shielding. 

Additional stopped-flow measurements probing different 
Nup153FG regions (FxFG-, PxFG-rich) with diverse NTRs 
(NTF2, TRN1, Importinp) are shown in Figure S7 and Table 
S2C. In all cases, we observed similar remarkably fast kinetics 
yielding consistent results for kon > 5-10® M“^s“\ 

Previously, solid phase binding assays indicated that the Im- 
portinp double mutant (I178D/Y255A, termed Importinp^'^) has 
a more than 60-fold lower Kd for binding to full-length Nup1 53FG 
as compared to Importinp WT (Bednenko et al., 2003). kon,BD 
dropped by only 40% compared to Importinp WT, which we 
confirmed by experimental FSF studies (drop of kon.psF by 
30%, Figure 5). However, fluorescence anisotropy measure- 
ments revealed an Importinp^'^ titration curve (Figure 5C) that 
confirms altered binding as compared to Importinp WT, as 
e.g., due to an increase in koff. 

Single-Particle Tracking Connects Nuclear Transport of 
Importinp'^^ and Importinp with FG-Nup Association 
Rates 

The efficiency of an NTR to bring cargo across the NPC barrier 
can be assayed using standard NPC transport assays. In these 
assays, a fluorescent cargo (NLS-MBP-eGFP) recognized by 
the Importinp transport machinery is incubated with permeabi- 
lized cells in the presence of a functional transport system and 
the resulting nuclear fluorescence is measured. In line with the 
previously reported lower Kd of Importinp'^'^, cargo accumulated 
slower compared to Importinp WT measurements (Figures 6A 
and 6B) which can e.g., be due to a lower barrier crossing 
time, a reduced docking efficiency to the NPC or cargo release 
from the NPC for example. 

A prediction from our kinetic analysis is that the actual speed 
of barrier crossing, which involves several binding and unbinding 
steps between NTR and FG repeats should be rather similar for 
WT and mutant Importinp, as changes in kon were small, and if at 
all, a higher koff for the mutant would make crossing even faster 
(see discussion). 

In contrast to the “bulk” transport assay, the speed of barrier 
crossing (characteristic crossing time) can be measured directly 
using single molecule (sm) tracking assays (Figure 6C), in which 
individual Importinp molecules are fluorescently labeled and 
tracked while they cross from one side of the NPC to the other. 
This yielded a typical value of 6.9 ± 0.2 ms for Importinp and 
6.1 ± 0.5 ms for Importinp'^'^ for barrier crossing (Figures 6D 
and 6E). We note that this crossing time is near the sampling limit 
of our technology, and thus faster crossing times cannot easily 
be captured. 



While desolvation effects cannot easily be tested experimen- 
tally, high ionic strength buffers can be used to shield long- 
range electrostatic interactions. We thus performed a salt 
titration ranging from 0.05 to 1 M ionic strength (using NaCI), 
permitting an estimate of kon under infinite electrostatic shield- 
ing by extrapolation using a Debye-Huckel-like approximation 



DISCUSSION 

The realization that many proteins are disordered has attracted 
considerable attention to the study of the molecular mechanisms 
controlling their interactions (Csermely et al., 2010; Tompa and 
Fuxreiter, 2008; Wright and Dyson, 2009), including the role of 
disorder in promoting or facilitating binding. In particular, very 
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Figure 7. Binding Modes of IDPs to Folded 
Proteins 

Schematic representation of various modeis 
describing the binding of an iDP to its foided 
partner, in an induced-fit mechanism the iDP 
partiaiiy or compieteiy foids upon interacting with 
its partner, potentiaiiy showing an intermediate 
encounter compiex as in the fly-casting mecha- 
nism (Shoemaker et al., 2000). In a conformational 
selection mechanism, the folded protein selects 
one (or several) conformation(s) of the IDP that 
best fits its binding pocket. These models suggest 
a shift in the IDP’s conformational ensemble. For 
the Nup'NTR complex we observed formation of 
an “archetypal” fuzzy and multivalent complex, a 
binding mode that on a global scale does not 
require major energy or time investment for the 
Nup to transit from its free to the bound confor- 
mation. Note that multiple NTRs can bind one Nup 
and vice versa. 



little is known about the binding mechanisms involved in com- 
plex processes such as nucleocytoplasmic transport, where 
NTRs have to engage in multiple, specific binding and unbinding 
events while traversing a tens of nanometer thick permeability 
barrier. 

In this study, we have used a multidisciplinary approach to 
investigate the molecular mechanism underlying the interaction 
process between NTRs and Nups. In general, from our three 
core findings a coherent view emerges on how multiple rapid, 
yet specific protein interactions can be achieved. 

Nup153FG Forms a Highly Dynamic Complex with 
Importing 

Based on our smFRET measurements, we found that 
NupISSFO^^*^*^ resembles full-length Nup153FG with respect 
to its dynamics (Figures 2 and S2). Upon interaction with Impor- 
tinp, NupISSFG^^*^^ remains flexible, engaging with Importinp 
only locally, as is evident from peak broadening in the respective 
HSQC spectra as well as R 2 relaxation rates (Figures 3, 
S1 , and S4). Local backbone sampling even of the interacting 
F was not measurably modified upon interaction. The con- 
formers of Nup153FG^^'^‘^ that were subjected to Importinp^ 
binding in the MD simulations were also devoid of large-scale 
conformational changes, and interactions were only observed 
between individual surface exposed residues of Nup153FG^'"'^^ 
and Importinp*^. 

It appears therefore that globally, the FG-Nup maintains its 
conformational ensemble as shown by smFRET. This observa- 
tion is sound, as IDPs frequently use motif binding to engage 
with their binding partners (Kragelj et al., 2015; Schneider 
et al., 2015; Tompa and Fuxreiter, 2008; Wright and Dyson, 
2009). Our observation suggests an extraordinarily small motif 
(the side chain of F), which would be difficult to identify from 
large-scale bioinformatics approaches (Dinkel et al., 2014). 

The observed binding mode appears distinct from other single 
motif binding interactions, as well as from mechanisms that 
involve global conformational transitions, such as folding upon 
binding (Csermely et al., 2010; Wright and Dyson, 2009) (Fig- 
ure 7). The intrinsic flexibility of the Nup, the repeated occurrence 



and short length of the binding motif seem to create a highly 
reactive binding surface, which renders the individual FG-motifs 
prone to bind at any time without compromising the Nup’s 
inherent plasticity. 

Ultra Rapid Association of the Nup153FG* Importinp 
Complex 

The maximal association rate in the absence of electrostatic 
forces for a binary interaction system (in which all collisions are 
productive) can be approximated by the Einstein-Smoluchowski 
diffusion limit, which yields a theoretical kon of ~10® M“^s“^ for 
the interaction of proteins of the size of Nup153FG and 
Importinp. 

Very high association rates have been observed previously in 
the presence of long-range electrostatic attractions (10^-10^° 
M“^s“^) for example for the barnase/barstar interaction (Spaar 
et al., 2006), as well as for small IDP complexes studied by 
NMR (Arai et al., 2012; Schneider et al., 2015). In the absence 
of electrostatic steering, this upper limit is typically never 
reached, as successful collisions require proper orientation 
of the binding partners. Consequently, most experimentally 
observed association rates at high salt concentrations fall into 
the regime of 10^-10® M“^s“^ (Shammas et al., 2013, 2014). 

Our ensemble FSF kinetics (for Nup153FG) and BD simula- 
tions (for Nup153FG^'''^°) show a kon of ~10^ M“''s“'' (Figure 5) 
supporting the aforementioned idea of a strongly reactive bind- 
ing surface. We specifically observe an influence of apolar des- 
olvation energies in the BD simulation and electrostatics are not 
found to play a major role in association. This applies apparently 
to both, Nup153FG^^'^‘^, which is uncharged and was tested in 
BD, as well as Nup153FG, which has several charges in the 
N-terminal regions (Figures 5D and S2). Even in the limiting 
case of electrostatic shielding we found complex formation to 
still have a remarkably fast kon.psF (Figures 5D, 5E and Table 
S2B). 

While experimentally bridging the gap between our molecular- 
level description of the small binary Nup • NTR complex (1 60 kDa) 
in solution to the actual in vivo transport mechanisms (involving 
~120 MDa NPCs) is still a challenging quest, the sm transport 
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experiments (Figure 6) underline that the initially unexpected 
kinetic findings for the Importinp^'^ mutant are in line with the 
finding in functional NPCs. 

Individual FG-Repeats Bind with Low Affinity and Act in 
Concert for Efficient Binding 

According to ensemble titration fluorescence curves, we have 
observed an apparent local equilibrium constant (Kd,app) be- 
tween Nup153FG and Importinp in the nanomolar regime 
(Figure 5C and Table S2). However, we report millimolar 
affinities per FG-motif from our NMR measurements within 
Nup1 SSFG^^*^^ (Figures 3 and S4), in line with a recent computa- 
tional model (Tu et al., 2013). Our NMR studies further suggest 
that individual FG-motifs bind independently of each other, as 
the R 2 rates are similar to the values of the unbound Nup 
between the FG-repeats. Nevertheless, the sum of FG-motifs 
influences the effective binding strength of individual FGs to 
Importinp, as can be seen by comparing the effective Kd for 
F1374 in the WT and the Nup153AG^^^’ mutant (Figure 3, 
S1, and S4). 

While these estimates of Kd values (from NMR and ensemble 
fluorescence) were measured on different Nup constructs, they 
also report on two different properties: the binding of Importinp 
to a larger region of Nup153FG (fluorescence anisotropy) and 
to a single FG-motif (NMR), and illustrate an important character- 
istic of the system, namely the importance of polyvalent interac- 
tions, which is exploited also by other transport receptors 
(Figure S4). While an individual FG-motif might be unlikely to 
be bound, the chances that at least one FG-motif within the 
Nup molecule is bound may remain high. This stabilizing effect 
of multivalency/polyvalency is well known, and is even used as 
a design principle in enhancing the affinity of ligand interactions 
with multi-site targets where ligands are connected in tandem 
via short linkers (Brabez et al., 2011; Kramer and Karpen, 
1998). Stability enhancements achieved in such experiments 
can approach four-to-five orders of magnitude and are primarily 
due to substantial decreases in the global dissociation rate, i.e., 
in a multivalent system the molecules only separate as a result of 
a dissociation event if all other motifs are unbound. 

To demonstrate generality of these three core findings, we 
performed additional smFRET, FCS (Figures S2 and S3), NMR 
(Figure S4), MD (Figure S5 and Movie S3), and FSF experiments 
(Figure S7 and Table S2C) on a variety of different Nups from hu- 
man and yeast, including the most common motif in vertebrates 
(FxFG) and the crucial GLFG sequence in yeast, for a diverse set 
of NTRs (NTF2, TRN1 , CRM1 , Importinp). All results are in close 
agreement, highlighting the universal nature of the observed 
mechanism. 

Currently, several models are discussed on how a permeability 
barrier in the NPC can be built; among those are the selective 
phase, the brush, the reduction of dimensionality and the karyo- 
pherin centric model, etc., as well as mixtures of those (Eisele 
et al., 2013; Frey and Gorlich, 2007; Jovanovic-Talisman et al., 
2009; Urn et al., 2007; Lowe et al., 2015; Moussavi-Baygi 
et al., 2011; Peters, 2009; Wagner et al., 2015; Yamada et al., 
2010). These models vary mainly over how FG-Nups are ar- 
ranged and potentially interlinked inside the NPC to create a tight 
barrier. However, common to all these models is that the con- 



centration of FG-repeats of about 50 mM creates a very crowded 
environment, which is roughly in line with stoichiometric mea- 
surements of Nups and the overall size of the central channel 
(Bui et al., 2013; Ori et al., 2013). Independently of the transport 
model assumed, mobility of an NTR inside the barrier is thus 
largely limited by the koff and kon of the interaction between 
FG-Nups and NTRs. This is also the case if FGs interact with 
FGs inside the barrier as proposed in the selective phase model 
(Frey and Gorlich, 2007), as long as these interactions are highly 
dynamic and do not pose a substantial energetic barrier or rate- 
limiting step to be melted. That we do not observe obvious FG- 
FG interactions in our studies is thus not necessarily inconsistent 
with such a model. 

If we were to naively consider the characteristic time for a sin- 
gle Nup and Importinp to separate based on commonly 
measured fast kon and affinities (e.g., Kd (Nup ‘NTR) ~100 nM 
and kon M“"'s“^ ^ unbinding time (UT) ~100 ms), it ap- 
pears impossible that Importinp could cross a 50 mM FG-filled 
pore within 5 ms. This is the previously described “transport 
paradox,” in which high specificity is somehow coupled with 
rapid transport (Bednenko et al., 2003; Ben-Efraim and Gerace, 
2001; Tetenbaum-Novatt et al., 2012; Tu et al., 2013). 

Our work (down to picosecond and atomic resolution) is 
largely compatible with the existing barrier models, as it ad- 
dresses on a molecular mechanistic level how an NTR could 
rapidly pass through a dense barrier. Using a simple model of 
a bivalent system, we already expect an order of magnitude dif- 
ference between the dissociation rate for an individual motif and 
that for the whole protein (Kramer and Karpen, 1998). We have 
also observed extremely rapid association rates (~10^ M“^s“^) 
and in Supplemental Experimental Procedures (two toy models) 
we outline that if we consider a very rough estimate for the char- 
acteristic time for an individual motif unbinding event (UT~1 [is) 
for full-length Nup153 (>24 valencies), it becomes clear that the 
Importinp could “creep” through the dense FG-motif plug of the 
pore within the short transport time. Such movement is consis- 
tent with our (Figure 6) and other NTR diffusion studies through 
NPCs in intact cells and various model systems (Eisele et al., 
2013; Frey and Gorlich, 2007; Jovanovic-Talisman et al., 2009; 
Moussavi-Baygi et al., 2011; Schleicher et al., 2014; Tu et al., 
2013; Wagner et al.,2015). 

In this case, nature has achieved a combination of high spec- 
ificity with fast interaction rates. This is based on many individual 
low-affinity motifs paired with a binding mode that requires rela- 
tively little energy or time investment for the Nup to transit be- 
tween free and bound conformations, and provides a rationale 
for the fast, yet specific, nuclear transport. While rapid binding 
can in principle be realized between proteins of single binding el- 
ements (e.g., driven by strong electrostatics), the proofreading 
emanating from the multiplicity and rapid repetition of many 
such events is what contributes to specific transport. 

We note that the transport paradox goes far beyond the rele- 
vance for the transport mechanism, since transient, but targeted 
interactions are central to the emerging view of highly dynamic 
protein (and other biomolecular) interaction networks. Further- 
more, FG-repeats are also present in stress and P granules 
(Toretsky and Wright, 2014). It seems likely that such ultrafast 
binding mechanisms are also important for other biological 
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recognition processes, where individual interaction motifs only 
make weak contributions, as e.g., in the recognition of glycans 
(Ziarek et al., 2013), or other very short linear motifs, like WG 
motifs in small RNA pathways (Chekulaeva et al., 2010), or 
binding of proteins to epigenetic marks, like many histone 
modifications. 

In addition, ultrafast association is achieved by using the 
unique plasticity of multivalent disordered proteins, which is 
distinct from mechanisms where orientation specific binding is 
required for complex formation. This represents an additional 
biological advantage for IDPs in comparison to folded proteins, 
and might have further facilitated their enrichment in organisms 
of higher complexity. 

EXPERIMENTAL PROCEDURES 

Expression and Purification of Importing, TRN1, NTF2, CRM1 and 
Nup153FG 

The proteins were purified essentiaiiy as described in (Milies and Lemke, 201 4) 
foiiowing routine coiumn chromatography and then transferred into the respec- 
tive measurement buffers. Labeiiing of Nupl 53FG (amino acids 875 to 1 475 of 
the fuii iength Nupl 53; numbering with respect to the fuii iength protein as in 
‘UniProt: P49790’) was performed using routine procedures to introduce 
Aiexa488 as a donor and Aiexa594 as an acceptor dye for smFRET experi- 
ments (and anaiog for other dyes), as described in (Milies and Lemke, 201 1) 

NMR Studies of Nup153FG^^'"® 

Spectral assignments of NuplOOFO^^*^® were obtained from a set of 

BEST-TROSY-type triple resonance spectra: HNCO, intra-residue HN(CA)CO, 
HN(CO)CA, intra-residue HNCA, HN(COCA)CB, and intra-residue HN(CA)CB 
(Solyom et al., 2013). For the measurements of RDCs, "'^C, Nup153FG^^'"® 

was aligned in 12 mg/ml Pf1 phages yielding a D 2 O splitting of 2.16 Hz. RDCs 
were measured using BEST-type HNCO and HN(CO)CA experiments (Rasia 
et al., 2011). ^^N relaxation dispersion was carried out at Nup153FG'^^'^‘^/lm- 
portinp concentrations of 250 ^iM and 180 ^iM, respectively, applying CPMG 
frequencies between 25 and 1 ,000 Hz (Schneider et al., 2015). All experiments 
were performed in Na-phosphate buffer (pH 6), 150 mM NaCI, 2 mM DTT, 
5 mM MgCl 2 , at 25°C and at a ^ H frequency of 600 MHz if not noted otherwise. 

The conformational space available to disordered Nup153FG'^^'^® was 
sampled using the Flexible-meccano statistical coil description (Ozenne 
et al., 2012) and representative ensembles in agreement with experimental 
chemical shifts were selected using ASTEROIDS (Jensen et al., 2010) and 
the ensemble was subsequently cross-validated against experimental RDCs 
and SAXS. 

SmFRET Experiments 

SmFRET measurements of dual labeled freely diffusing proteins were per- 
formed on a confocal geometry detecting donor and acceptor intensities 
(from which the FRET efficiency Efret is calculated) as well as fluorescence 
lifetimes (t) on a custom built multiparameter setup as previously described 
(Milies and Lemke, 2011). 

Fluorescence Stopped-Fiow Experiments 

The association kinetics were monitored by following the fluorescence anisot- 
ropy change of Nupl 53FG labeled at the indicated position with Cy3B (see se- 
quences in Supplemental Experimental Procedures) upon binding to different 
concentrations of NTRs, under pseudo-first order conditions. Anisotropy (r) 
was calculated from fluorescence intensities measured with polarizing filters 
in the parallel (||) and perpendicular (J_) position. 

Each trace was obtained by averaging >30 traces and background fluores- 
cence was then subtracted. The anisotropy traces where fit with a biexponen- 
tial function to determine kobs- The different kobs were plotted against the 
respective NTR concentrations and were linearly fit to obtain the association 
constant (kon) from the slope. 



The used BioLogic (Grenoble, France) stopped-flow equipment permits 
automatic titration and repeated technical replicates, which typically yield a 
small standard deviation. We derived an experimental error of ~20% in kon 
measurements between different biological replicates. To be conservative, 
we thus do not show (the typically lower) standard deviations from technical 
replicates. 

Transport Experiments 

Routine reconstitution of the nucleocytoplasmic transport machinery in permea- 
bilized cells was used and fluorescence cargo (NLS-MBP-eGFP) was imaged on 
a confocal microscope (Leica, Mannheim) at the indicated time points. 

For single molecule tracking of NTRs, the same assay was used, but 
lmportinp-Alexa488 at single molecule concentration was tracked with an 
acquisition time of 2ms on a previously described home built imaging micro- 
scope (Ori et al., 2013). 

All data analyses for FSF, FCS, smFRET and tracking were performed with 
custom written routines in IgorPro (Wavemetrics, OR). 

MD and BD Simulations 

The Nup153FG'^^'^® fragment was modeled on the basis of its sequence 
that also included the exogenously inserted residues used for labeling of the 
fragment with fluorophores. For the binding simulations, Nup153FG^^'^® or 
Nupl 53 '^®'^^'^° were randomly placed in a box of dimensions 15 x 15 x 
1 5 nm^ together with the N-terminal segment of Importinp'^ (PDB: 1 F59). Brow- 
nian Dynamics (BD) simulations were performed starting from the MD complex 
that showed a specific association between the partners, and resembled the 
crystallographic binding pose as reported by ref. (Bayliss et al., 2000). 
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SUMMARY 

A key effector route of the Sugar Code involves lec- 
tins that exert crucial regulatory controls by target- 
ing distinct cellular glycans. We demonstrate that 
a single amino-acid substitution in a banana lectin, 
replacing histidine 84 with a threonine, signifi- 
cantly reduces its mitogenicity, while preserving 
its broad-spectrum antiviral potency. X-ray crystal- 
lography, NMR spectroscopy, and glycocluster as- 
says reveal that loss of mitogenicity is strongly 
correlated with loss of pi-pi stacking between aro- 
matic amino acids H84 and Y83, which removes a 
wall separating two carbohydrate binding sites, 
thus diminishing multivalent interactions. On the 
other hand, monovalent interactions and antiviral 
activity are preserved by retaining other wild-type 
conformational features and possibly through 
unique contacts involving the T84 side chain. 
Through such fine-tuning, target selection and 
downstream effects of a lectin can be modulated 
so as to knock down one activity, while preserving 
another, thus providing tools for therapeutics and 
for understanding the Sugar Code. 

746 Cell 163, 746-758, October 22, 2015 ©2015 Elsevier Inc. 



INTRODUCTION 

Protein-carbohydrate interactions play essential roles in many 
biological processes, including adhesion and growth regulation, 
infection, and tumor pathogenesis (Gabius, 2015; Solis et al., 
2015). Glycan-encoded information can be translated into 
cellular effects by receptors, termed lectins (Boyd, 1954). These 
carbohydrate-binding proteins are widely found in nature, have 
been put to considerable use in many aspects of glycobiology 
(Andre et al., 201 5; Gabius et al., 201 1 , 201 5), and have the po- 
tential to be used as antiviral agents. By specifically binding to 
mannosides of the glycans of glycoproteins on the surface of a 
virus, they can block viral attachment and/or fusion to cells. 

Possible clinical applications of lectins suffer from a major 
drawback, the potential for side effects mediated by lectin- 
induced mitogenicity (Borrebaeck and Carlsson, 1 989). If a mito- 
genic lectin were used topically in an anti-HIV microbicide, it 
could lead to uncomfortable inflammation, an increase in viral 
transmission, and even greater HIV replication because of its 
ability to activate T cells. Given parenterally, a mitogenic lectin 
could lead to systemic inflammation (Huskens et al., 2008). To 
date, it has remained entirely unclear whether mitogenicity and 
antiviral activity are dissectible in a given lectin. 

We set out to rationally engineer a plant lectin isolated from the 
fruit of bananas (Musa acuminata, BanLec) (Singh et al., 2014), 
so as to eliminate its mitogenicity, while retaining its potent 

CrossMark 
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Figure 1. The H84T BanLec Mutant Is Significantly Less Mitogenic than Is WT BanLec 

(A) Comparisons of the mitogenic activity of H84T to recombinant WT BanLec. PBLs from two different donors were treated with varying concentrations of iectin 
for 3 days and then tested for mitogenic activity by measuring BrdU incorporation by ELiSA. A stimuiation index of iess than ten (gray iine) is considered non- 
mitogenic. The sampies for each donor were anaiyzed in tripiicate, and error bars represent SEM. The D133G BanLec mutant, in which CBS i is aitered (see 
Figure 4), is not mitogenic but aiso iacks any antivirai activity. 

(B) Induction of the activation marker CD69 on CD4 T cells in the presence of WT or H84T as measured by flow cytometry, 1 or 3 days post-treatment. 

(C) Induction of cytokines/chemokines by WT and H84T BanLec. PBMCs from healthy donors were incubated for 72 hr with WT or H84T BanLec at 2 |ag/ml. 
Supernatants were collected, and cytokine levels were measured by the Bio-Plex array system. The fold-increase values of the cytokine concentrations in the 
supernatant of stimulated PBMCs with respect to the concentrations in the supernatant of untreated PBMCs were determined for samples from four different 
donors. The fold-increase values are divided into subgroups: 1 - to 3-fold increase (white squares), 3- to 1 0-fold increase (yellow squares), 1 0- to 1 00-fold increase 
(orange squares), 100- to 500-fold increase (dark red squares), and >500-fold increase (black squares). 

See also Figure SI . 



antiviral activity. BanLec is a member of the mannose-specific 
jacalin-related lectin (mJRL) group that functions as a potent 
T cell mitogen (Singh et al., 2014). It forms a dimer with two car- 
bohydrate-binding sites (CBS I and CBS II) in each protein 
subunit (Meagher et al., 2005; Singh et al., 2005). BanLec avidly 
associates with high-mannose-type N-glycans on the HIV-1 en- 
velope and can thus block viral entry into cells (Swanson et al., 
2010; Ferir et al., 2011). Here, we show that a mutation within 
the sugar-binding site in BanLec makes it possible to signifi- 
cantly decrease mitogenic activity without compromising anti- 
viral activity against HIV, hepatitis C virus (HCV), and influenza 
virus, all of which have high-mannose-type N-glycans on their 
surfaces. This new form of BanLec thus has the potential to be 
used as a broad-spectrum antiviral agent, something that is 
presently not available in the clinic. Further, we detail the molec- 
ular basis for separating these two distinct activities of the lectin. 
Our results provide proof of the feasibility of re-engineering 
target specificity and activity of a lectin, an approach that will 
greatly help to clarify how lectins read and transmit information 
through the Sugar Code, the biochemical platform that turns 
complex, sugar-encoded information into a broad spectrum of 
biological activities (Gabius et al., 2011; Murphy et al., 2013; 
Solis et al., 2015). 

RESULTS 

The Antiviral and Mitogenic Activities of BanLec Can 
Be Uncoupled through the Substitution of a Single 
Amino Acid 

The BanLec cDNA was cloned, and the recombinant protein 
containing a 6x His-tag, with the sequence LEHHHHHH, ex- 
pressed in Escherichia coii. Unless stated otherwise, all of the 
BanLec proteins utilized in this study are recombinant versions 
containing a His-tag. The recombinant His-tagged version of 



BanLec maintains mannose-binding properties as measured 
by isothermal titration calorimetry (ITC) (see discussion below) 
and anti-HIV activity (Figure SI). Natural BanLec is a mitogen 
(Gavrovic-Jankulovic et al., 2008), and we confirmed this finding 
with the recombinant version by exposing peripheral blood lym- 
phocytes (PBLs) to the lectin for 3 days and measuring incorpo- 
ration of bromodeoxyuridine (BrdU) (Figures 1 A and SI). 

To pinpoint potentially promising sites for mutational engi- 
neering, we examined crystal structures of the p-prism I struc- 
ture, which is characteristic for the JRL family (Meagher et al., 
2005; Singh et al., 2005). This fold consists of three Greek key 
structures composed of p strands; distinct loops found in the 
Greek keys play a role in carbohydrate binding. The first and 
second Greek keys include the JRL consensus motif GXXXD 
for sugar binding, and when mutations were introduced into 
these Greek keys, they abolished the mitogenic activity (as 
seen with the D133G mutant shown in Figure 1A), but also re- 
sulted in a loss of almost all anti-HIV activity (data not shown). 
The third Greek key varies among JRL members in length and 
sequence and is thought to play a role in binding glycan struc- 
tures beyond simple saccharides (Nakamura-Tsuruta et al., 
2008). H84 is part of this third loop, known to be involved in 
binding the second sugar moiety in al ,6-dimannosides (Singh 
et al., 2005). Therefore, we reasoned that altering this amino 
acid might result in a change in binding characteristics that 
would affect the lectin’s mitogenic and antiviral activities 
differentially. 

Several H84 BanLec mutants were constructed (see further 
discussion below), and one variant, H84T, in which the histidine 
is replaced by a threonine, was found to not stimulate the prolif- 
eration of lymphocytes at concentrations up to 1 |iM (Figure 1 A). 
While increased cell-surface expression of the activation marker 
CD69 was observed for BanLec-treated CD4^ peripheral blood 
mononuclear cells (PBMCs), the H84T variant induced very little 
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Table 1. Anti-HIV Activity Profiie of BanLec, H84T BanLec, Microvirin, and the 2G12 Monoclonai Antibody in PBMCs 
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NL4.3 


BaL 


B C 


F 


G 


H 










US2 DJ259 


BZ163 


BCF-DIOUM 


BCF-KITA 


BCF-06 


BV-5061W 


(X4) 


(R5) 


(R5) (R5) 


(R5) 


(R5) 


(R5) 


(X4) 


(X4) 



MVN^ 


8 


22 


2 


167 


nd 


nd 


nd 


>350 


>350 


BanLec^ 


0.87 


0.87 


1.1 


2.2 


2.5 


6.5 


3.6 


14 


3.7 


H84T BanLec^ 


2.1 


0.93 


1.5 


0.47 


3.1 


4.1 


1.2 


0.73 


0.33 


2G12 mAh'" 


140 


3,710 


40 


>50,000 


> 20,000 


> 20,000 


> 20,000 


> 20,000 


> 20,000 



Viral co-receptor usage (R5 orX4) is determined in U87.CD4.CCR5 and U87.CD4.CXCR4 cells and indicated in parentheses. MVN, Microvirin; nd, not 
determined. 

^50% inhibitory concentration (IC 50 ) in nanomolars required to inhibit viral p24 (for HIV-1) or p27 (for HIV-2) production by 50% in PBMCs. 
“^Antibody concentration in nanograms per milliliter required to inhibit viral p24 (for HIV-1) or p27 (for HIV-2) production by 50% in PBMCs. 



stimulation of this same marker (Figure 1 B). Moreover, wild-type 
(WT) BanLec consistently caused a large increase in the induc- 
tion of cytokines from the PBMCs of multiple individual donors, 
whereas the response to H84T was markedly reduced (Fig- 
ure 1C). Thus, H84T, unlike naturally occurring and WT recombi- 
nant BanLec, is minimally mitogenic when tested by three inde- 
pendent methods on the peripheral blood cells of multiple 
different donors. 

In contrast to its loss of mitogenicity, the H84T variant had an 
IC 50 value against HIV in the low nanomolar range and was 
equally effective at inhibiting a wide range of HIV isolates as 
was WT BanLec, including multiple clinical isolates from different 
clades of group M, a group O clinical isolate, and a clinical isolate 
of HIV-2 (Table 1). Of note, a number of the isolates that were 
susceptible to H84T at low nanomolar concentrations required 
higher concentrations of the anti-HIV lectin Microvirin and/or 
were very difficult to inhibit with 2G12, a classic neutralizing 
anti-HIV monoclonal antibody (Table 1). Recombinant H84T 
without the His-tag showed very similar anti-HIV activity (data 
not shown). 

To determine the capacity of H84T BanLec to prevent mucosal 
HIV transmission, we utilized the bone-marrow-liver-thymus 
(BLT) humanized mouse model (Wahl et al., 2012). H84T or 
PBS (the carrier) was topically applied to the vagina prior to chal- 
lenge with HIV-1 jr_csf- a total of 50% of the mice treated vagi- 
nally with PBS became infected, as determined by the presence 
of viral RNA in the plasma. In contrast, none of the mice treated 
topically with H84T showed detectable levels of viral RNA in the 
plasma during the course of the experiment (p = 0.0359; 
Figure 2A). 

The antiviral efficacy of H84T was further evaluated against 
another important pathogenic virus that presents oligomanno- 
side chains on its surface proteins, hepatitis C virus (HCV; 
Goffard et al., 2005). An intergenotypic HCVcc reporter virus, 
i.e., BiGluc-Conl/Jcl , was tested in Huh-7.5 cells (Figure S2) 
(Reyes-del Valle et al., 2012). The addition of H84T to the inoc- 
ulum decreased HCV in a dose-dependent manner and to levels 
comparable to inhibition by CD81 antibody, a positive control 
that blocks the cellular receptor for HCV (Figure S2A; data not 
shown). Co-incubation of virus inoculum with the BanLec deriv- 
ative D1 33G/38A, which, similar to the D1 33G mutant (Figure 1 A), 



is inactive, was found to not decrease viral replication (Fig- 
ure S2B). At the EC 90 concentration (determined in Huh-7.5 
cells), H84T also reduced HCV replication to levels similar to 
neutralizing E2 antibody in a primary human fetal liver culture 
(data not shown). Finally, to determine if the H84T-specific 
reduction of HCV was due to inhibition of viral RNA replication, 
the effect of H84T BanLec was monitored in Huh-7.5 CD81 
knockdown cells (CD81 Figure S2C). In this single-cycle assay, 
H84T decreased HCV replication over time in the control cell 
background only, further supporting the hypothesis that H84T in- 
hibits viral replication at entry (receptor binding, membrane 
fusion), consistent with what we previously observed with WT 
BanLec against HIV (Swanson et al., 2010). 

Glycosylation sites on the HCV El and E2 envelope proteins 
are highly conserved across genotypes (Goffard et al., 2005). Uti- 
lizing a panel of chimeric Gaussia luciferase reporter viruses, in 
which the structural region (core-NS2) was encoded by differing 
genotypes, H84T was observed to decrease HCV replication 
in a dose-dependent manner (Figures 2B-2J; Table SI). H84T 
BanLec thus appears to be a pan-genotypic inhibitor of HCV 
infection. 

The hemagglutinin of influenza A viruses bears high-mannose- 
type N-glycans that are susceptible to host lectins (Ng et al., 
2012). In studies employing a retroviral core pseudotyped with 
the hemagglutinins of the 1918 H1N1 and the H5N1 avian 
pandemic influenza viruses, WT and H84T BanLec were both 
very active and equally inhibitory (Figures 2K and 2L). 

Next, we found that H84T BanLec is very active against multi- 
ple WT strains of influenza A tested in MDCK cells in tissue cul- 
ture. Significant activity was seen against A/California/04/2009 
(H1N1 pandemic strain), California/07/2009 (H1N1 pandemic 
strain), /VNew York/18/2009 (H1N1 pandemic strain), and 
Perth/16/2009 (H3N2) with EC 50 values of 1-4 |ig/ml versus 
H1N1 virus and 0.06-0.1 versus H3N2 virus. A mutant form of 
BanLec that does not bind mannose, D133G/D38A, had no ac- 
tivity, excluding carbohydrate-independent effects. Importantly, 
significant activity was also seen with H84T against the Duck/ 
MN/1 525/81 H5N1 avian strain (EC 50 of 5-11 iig/ml), confirming 
our results obtained with pseudotyped virus (Figure 2L). Finally, 
as some mouse-adapted strains of influenza lack mannose 
on their hemagglutinin (Smee et al., 2008), we tested an H1N1 



748 Cell 163, 746-758, October 22, 2015 ©2015 Elsevier Inc. 




Cell 




[Lectin] (nM) 




[Lectin] (nM) 




Post-Virus Exposure Time (days) 



B 



C 



D 









Figure 2. H84T BanLec Has Potent Antiviral Activity In Vitro and In Vivo 

(A) Protection from vaginal HIV-1JR-CSF infection of BLT humanized mice by H84T BanLec. Mice were vaginally exposed to HIV in the presence or absence of 
topical H84T. HIV infection was determined by the presence of plasma viral load over a period of observation of 6 weeks. The times to plasma viremia were then 
combined to generate a Kaplan-Meier plot of the protection from vaginal HIV infection provided by H84T BanLec. Log rank analysis (p = 0.0359) confirmed that 
topically administered H84T prevents vaginal HIV-1 JR-CSF infection in BLT mice. 

(B-J) Increasing concentrations of H84T (0, 10, 20, 40, 80, 160, 320, 640, and 1 ,280 nM) were mixed with the indicated HCVcc inoculum at a MOI of 0.1 or 0.05. 
After 6 hr incubation, cells were washed and media containing additional lectin was added. At 72 hr post-infection, HCV replication was analyzed by luciferase 
activity in supernatants. All HCVcc were bicistronic Gaussia luciferase reporter genomes, of which structural proteins were encoded by differing genotypes as 
indicated. The means and SD are plotted for two independent experiments containing five replicates each. The corresponding EC 50/90 values and their respective 
confidence intervals were determined and are displayed in Table SI . See also Figure S2. 

(K) The activity of WT or H84T BanLec against the 1918 H1N1 pandemic influenza strain as measured by luciferase assay in the pseudotyped virus system 
described in the Experimental Procedures. 

(L) The activity of WT and H84T against the H5N1 avian influenza strain as assessed in (K). 

(M) . Survival of mice challenged intranasally with influenza and then treated with H84T BanLec or control intranasally 4 hr after challenge and then daily for 
5 days. 



(A/WSN/1 933) isolate previously shown to be inhibited by 
mannose-binding proteins for its sensitivity to our new agent. 
H84T was indeed quite active against this H1N1 strain, which 
causes disease in mice. Most importantly, we found that intra- 
nasal (IN) H84T BanLec, first given 4 hr after IN viral challenge, 
effectively blocks influenza infection in the mouse model (Fig- 
ure 2M). Taken together, studies with pseudotyped virus, WT 
virus in tissue culture, and a mouse model of influenza demon- 
strate significant activity of H84T against multiple strains of 
influenza. 



H84T BanLec Is Less Active in Multivalent Interactions 

To begin to delineate the basis for the H84T mutant protein’s 
markedly decreased mitogenic and pro-inflammatory activity, 
while yet maintaining its potent antiviral capacity, binding 
properties of H84T and WT BanLec to monovalent sugars in so- 
lution were compared. The association constants (Ka) measured 
using isothermal titration calorimetry (ITC) for binding to methyl 
a-D-mannopyranoside were similar for recombinant His-tagged 
WT (383 mM“^) and H84T (353 mM“^) and were consistent 
with previous measurements for naturally occurring BanLec 
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(333 mM“^) (Mo et al., 2001; Winter et al., 2005). Interestingly, 
slightly weaker affinities were observed for H84T as compared 
to WT when analyzing binding to dimannoside (300 versus 
227 mM“^ for WT and H84T, respectively) (Table S2). 

As mitogenicity involves cross-linking of distinct counterre- 
ceptors on cell surfaces that trigger outside-in signaling, the 
loss of mitogenicity seen with H84T and its slightly diminished 
binding affinity for disaccharides compared to monosaccharides 
suggested that the biological differences between the two pro- 
teins might arise due to the differences in their binding properties 
to more complex glycans. A simple assay that provides insight 
into binding to cell-surface glycans and cross-linking activity 
(here in trans, that is, between cells) is measuring lectin-induced 
aggregate formation of erythrocytes. The minimal concentra- 
tions for agglutination were found to be significantly different, 
i.e., at 3 and 437 iig/ml for WT and H84T, respectively (Table 
S2). This result reveals a marked disparity in building stable 
aggregates based on more than monovalent interactions with 
cell-surface mannosides. 

Synthetic glycoclusters are excellent tools that range in size 
from bivalent compounds to glycodendrimersomes (Murphy 
et al., 2013; Solis et al., 2015), so their locally increased density 
of ligands will trace a change in the interaction/association 
profile when testing WT and variant proteins under identical con- 
ditions. The association of a lectin with a ligand-bearing surface 
is sensitive to the presence of haptenic sugar, and its presenta- 
tion in local clusters can enhance its inhibitory capacity. 
Mimicking the natural display of high-affinity ligands, synthetic 
glycoconjugates (carbohydrates attached to a scaffold enabling 
oligo- to polyvalency) thus are able to interfere with lectin binding 
to ligand-presenting surfaces in quantitative terms. The design 
of glycoclusters and the determination of their inhibitory activity 
on lectin binding (to glycoproteins or to a cell), measured as the 
inhibitory concentration (1C) at which the extent of lectin binding 
to a glycoligand is reduced by 50% (IC 50 value), provide a mea- 
sure of the avidity of a lectin for multivalent associations. In total, 
we tested a panel of 1 1 bi- to dodecavalent glycoclusters sys- 
tematically in titrations in two types of assay, one biochemical 
and one cellular. In both cases, the mannose-specific lectin 
concanavalin A was used as positive control, and lectin binding 
to the glycan-presenting matrix was ascertained to be saturable 
and dependent on carbohydrate presence. 

First, we established a surface rich in presentation of 
mannose residues. A neoglycoprotein (a conjugate of albumin 
and mannose derivatives) was adsorbed to the plastic surface 
of microtiter plate wells, building the matrix for letting the bio- 
tinylated lectins dock. The surface-associated label was then 
quantitatively assessed spectrophotometrically. Titrations of 
the extent of binding with increasing amounts of inhibitor were 
performed to determine the IC 50 value; the glycoclusters (Fig- 
ure 3A) were individually tested. As demonstrated by the 
example shown in Figure 3B, these experiments allowed us to 
determine IC 50 values as a measure for sensitivity of lectin bind- 
ing in the presence of inhibitors. Binding of the H84T mutant was 
found to be much more susceptible to glycocluster inhibition 
than was surface contact formation of the WT BanLec, consis- 
tent with the lower cross-linking capacity in hemagglutination 
(Tables S2 and S3). 



To confirm the above and increase the biological relevance of 
the findings, we proceeded to monitor cell binding, using the 
surface of cultured cells as a platform for contact of the labeled 
lectins. Tested under identical conditions, WT reacted more 
strongly with cells than did H84T (Figures 3Ca and 3Cb). In addi- 
tion to testing the physiologic glycome profile on the cells, we 
increased the level of lectin-reactive high-mannose-type N-gly- 
cans by treating the cells with the a-mannosidase I inhibitor 1- 
deoxymannojirimycin. Enhanced binding of both proteins was 
seen (Figures 3Cc and 3Cd), with the difference in mean fluores- 
cence intensity between H84T and WT being maintained. Thus, 
increased ligand availability did not reduce the relative difference 
between H84T and WT proteins. Glycocluster testing on cells, for 
example, the tetravalent compound 11 (Figures 3Ce and 3Cf), 
fully confirmed the differential sensitivity seen in the solid-phase 
assays. These results are completely consistent with the 
decreased capacity for FI84T BanLec to agglutinate erythrocytes 
and further confirm that H84T and WT differentially interact with 
multivalent surfaces, but not with the monosaccharide. 

High-Resolution X-Ray Structures Reveal Loss of Pi-Pi 
Stacking between Y83 and H84 and an Altered Sugar 
Contact Profile in H84T 

To examine the structural basis for the difference in carbohy- 
drate-binding modes between WT and H84T, we determined 
the crystal structures of the recombinant proteins both in the 
absence and in the presence of dimannoside (M2) (Figure 4). 
The X-ray structure of recombinant WT BanLec was very similar 
to its naturally occurring, purified counterpart (Meagher et al., 
2005), consistent with the similar biological activities of the two 
proteins. The monomer forms a p-prism I fold containing three 
Greek key motifs with 3-fold symmetry and two carbohydrate- 
binding sites (CBS I and II). CBS I consists of loops on the 
top of the first Greek key; CBS II sits on the top of the second 
Greek key. The two binding sites are separated by a loop (resi- 
dues 83-88) within the third Greek key (Figure 4A), which has 
been suggested to be an important determinant of carbohydrate 
binding specificity (Meagher et al., 2005); H84 is within this loop. 
It is worth noting that glycerol units were observed in the different 
binding sites of the WT protein. 

Both recombinant His-tagged proteins (WT and H84T) and 
the WT from bananas form a dimeric structure with interface 
between p strand 1 (residues 4-10), p strand 10 (residues 
110-118), and two C-terminal residues (El 40 and PI 41) from 
each monomer, resulting in a quasi-eight-stranded p sandwich 
structure. The presence of the C-terminal His-tag on recombi- 
nant WT and H84T neither altered the dimer interface nor did its 
presence disrupt the non-biological asymmetric tetramer that 
formed due to crystal packing in all the reported BanLec 
crystals. 

Apo WT and H84T form very similar structures as indicated by 
an overall root-mean-square deviation of 0.26 A. Nevertheless, 
there are significant differences in and around the site of muta- 
tion. In WT, H84 stacks on Y83 to form a pi-pi stacking interac- 
tion that directs both residues toward CBS II, resulting in a 
“wall” that separates the two CBS (Figure 4B). In sharp contrast, 
in H84T, no pi-pi stacking can occur (Figure 4C). Instead, the 
threonine side chain points toward CBS I (Figure 4C). In WT, 
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Figure 3. Binding of H84T and WT BanLec to Glycoclusters 

(A) Structures of the tested glycoclusters. 

(B) Titration curves for reiative signai intensity, reflecting the extent of binding of WT (blue) and H84T mutant (yellow) BanLec proteins to surface-immobilized 
neoglycoprotein in the presence of increasing amounts of the tetravalent maltose-presenting glycocluster (11). 

(C) Semilogarithmic illustration of fluorescent surface staining of human SW480 colon adenocarcinoma cells by labeled WT (left) or H84T (right) BanLec. Control 
for background (0% value) is given as the gray area, and quantitative data (percentage of positive cells/mean fluorescence intensity) are presented. Lectin staining 
was monitored with increasing concentrations (1 , 2, and 5 |ag/ml; given in a and b), at 2 |ag/ml with cells without (gray) or after treatment (black) with 1-deoxy- 
mannojirimycin (c and d), and at 1 .5 (WT) or 3 (H84T) |ig/ml with the tetravalent glycocluster 11 at 1 mM (WT) or 0.75 mM (H84T) (e and f). 

See also Tables S2 and S3. 



the H84/Y83 stack prevents the side chain of residue 84 from 
pointing toward the CBS I. 

The X-ray structures of WT and H84T bound to a dimannoside 
(M2) feature two dimers in the asymmetric unit forming a non- 
biological asymmetric tetramer, and four sets of CBS each 
bound to a dimannoside molecule. The position of the first 
mannose moiety of M2 is well resolved in the electron density 
maps of CBS I and II of both proteins, suggesting that it is tightly 
bound to both structures (Figures 4B, 4C, and S3). In CBS I of the 



WT and H84T, there are five hydrogen bonds (H-bonds) between 
each protein and the first mannose moiety, involving OD1 and 
OD2 of D133 and the backbone N of G15, K130, and F131. In 
CBS II, there are six H-bonds stabilizing the position of the 
saccharide, which include side-chain atoms OD1 and OD2 of 
D38 and the backbone N of N35, V36, and G60. 

The main difference in ligand binding between the proteins in- 
volves the second mannose moiety that is more accessible to 
solvent and residue 84. This second mannose moiety gives 
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Figure 4. A Comparison of the Crystal 
Structures of Recombinant WT BanLec 
and Its H84T Mutant 

(A) Overlay of the structures of a monomer of re- 
combinant WT (blue) and H84T (yellow) BanLec. 
Both structures are presented as cartoons with 
residue 84 shown in ball and stick with oxygen 
atoms in red, nitrogen atoms in blue, and carbon 
atoms in the color of the monomer. The N and C 
termini are labeled. The right image is the result of 
rotating the left image 90° toward the viewer. CBS, 
carbohydrate-binding site. 

(B and C) Binding of a dimannoside to WT BanLec 
in blue (B) and to the H84T mutant in yellow (C). A 
disaccharide is shown in gray, and individual 
atoms are colored as in (A). Residues involved in 
hydrogen bonding are shown in ball and stick, and 
hydrogen bonds are shown as dashed lines. The 
pi-pi stacking between Y83 and H84 in the WT 
protein is circled. 

See also Figure S3 and Tables S4 and S5. 




visible electron density in CBS I for three out of four chains of the 
WT protein and all four chains of the H84T protein, but is present 
in the CBS II for only one H84T chain. For the CBS I site, each 
protein makes one H-bond with the second mannose moiety. 
In WT, the H84 side chain does not engage in H-bonds with 
the second mannose moiety in the CBS I pocket (Figure 4B), 
while in H84T, the side chain of T84 swings into the CBS I pocket 
to form a H-bond with this 01 hydroxyl oxygen of the sugar (Fig- 
ure 4C). The existence of pi-pi stacking locks the imidazole ring 
of H84 toward the CBS II, and its loss in H84T allows for this 
reorientation toward the CBS I. Thus, although the global struc- 
tures of WT and H84T are not markedly different, the loss of pi-pi 
stacking alters the carbohydrate-protein contacts and topologi- 
cal presentation of the carbohydrate-binding site, potentially ex- 
plaining the difference in their biological behavior. 

NMR Spectroscopy and Molecular Dynamics 
Simulations Reveal Differences in the Structures of WT 
and H84T BanLec 

We used solution-state NMR spectroscopy to further delineate 
any differences between WT and H84T. NMR spectra showing 
a single set of resonances for the monomeric subunit are consis- 
tent with both WT and H84T forming symmetric oligomers. How- 
ever, both proteins exhibited a tendency to aggregate over time 



and this precluded application of multidi- 
mensional NMR experiments for reso- 
nance assignments (Sattler et al., 1999). 
Although BanLec is a dimer in solution 
at physiological pH (Khan et al., 2013), 
the X-ray structures reveal the possibility 
of BanLec forming asymmetric tetramers. 
Therefore it is probable that the high 
protein concentration used in NMR 
promotes the formation of higher-order 
aggregates. To reduce this tendency, 
we introduced two mutations: Y46K to 
disrupt the protein-protein interactions 
of the tetramer and V66D to increase protein hydrophilicity and 
to disrupt an additional crystal packing site. The resultant 
Y46KA/66D mutants of WT and H84T indeed formed stable di- 
mers as judged by ^^N NMR spin relaxation measurements 
(see below) and resulted in spectra very similar to those of Ban- 
Lec without the Y46KA/66D mutations, with the differences pri- 
marily localized around the mutation site (Figures S4A and S5). 
The double-mutant version of the WT was used to obtain assign- 
ments, which were then used to assign its H84T counterpart and 
the corresponding BanLec proteins lacking the double mutation 
(Figures S4, S5, and S6). In agreement with the crystal struc- 
tures, we observed significant overlap when comparing the 2D 
^^N-^H HSQC spectra of WT and H84T, indicating that the two 
proteins adopt a similar fold (Figure 5A). However, significant dif- 
ferences in chemical shifts were observed in the third Greek key, 
indicating that the H84T mutation does affect the structural and/ 
or dynamic properties at this site. 

The chemical shift differences between WT and H84T span the 
entire ligand recognition loop (residues 83-88), which plays 
important roles in determining the carbohydrate-binding speci- 
ficity (Figures 5B and S4B). The mutation may broadly affect 
the conformation of this loop, possibly due to loss of pi-pi stack- 
ing as observed in the X-ray structure. We did not observe signif- 
icant differences in the ^^N NMR spin relaxation rates (Palmer, 
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Figure 5. Solution NMR Spectroscopy and Molecular Dynamics Simulations Reveal Dynamic Differences in the Conformations of WT and 
H84T BanLec at the Third Greek Key 

(A) Comparison of H84T mutant and WT BanLec. HSQC spectra of WT (blue) and H84T BanLec (yellow). 

(B) Chemical shift changes induced by the H84T mutation color-coded on the structure of WT BanLec. 

(C) Chemical shift changes upon pentamannoside binding color-coded on the structure of WT BanLec. 

(D) Chemical shift differences between H84T and WT BanLec when interacting with sugar color-coded on the structure of WT BanLec. For (B), (C), and (D), the 
magnitude in chemical shift change increases from blue (no change) to red (maximal change). Gray corresponds to residues for which the change could not be 
accurately measured. Sugar moieties are in black. 

(E) Comparison of WT and H84T Lipari-Szabo order parameters (S^ varies between zero and one for maximal to minimal flexibility/amplitude of motions, 
respectively) computed for WT (blue) and H84T (yellow) using accelerated MD. 

(F) Proposed mechanism for separating antiviral activity and mitogenicity using the H84T mutation. Top: in the apo-form (left), the pi-pi stacking interaction helps 
separate two binding pockets that can engage with branched N-glycans or sugar moieties on different glycan molecules, creating multivalent interactions, while 
in the H84T mutant loss of pi-pi stacking between residues 84 and 83 results in a more open binding pocket that can engage multiple sugar moieties on the same 
glycan molecule, limiting the possibility for multivalent interactions. The dashed line symbolizes the capability of the H84T side chain to interact with a sugar in the 
CBS I, which helps to retain the capability to interact with a single sugar, while mixing the recognition elements of the two binding sites. 

See also Figures S4, S5, S6, and S7. 



2004) for these and other sites, indicating that WT and H84T have 
similar dynamics at the picosecond to nanosecond timescales, 
as well as similar oligomerization states (Figure S7A). This is 
consistent with the similar dynamics observed for WT and 
H84T using conventional molecular dynamics (MD) simulations 
(Figure S7B). However, accelerated MD simulations (Markwick 
and McCammon, 2011), which can probe slower motions, 
showed higher flexibility in the ligand recognition loop (83-87) 
in H84T as compared to the WT protein, consistent with loss of 
stabilizing pi-pi stacking interactions (Figure 5E). 

Next, we performed NMR chemical shift titrations to investi- 
gate the interaction of WT and H84T proteins with di- and pen- 
tamannosides in solution. The addition of dimannose to WT 
and H84T or pentamannose to their Y46KA/66D mutant versions 



resulted in significant chemical shift perturbations or broadening 
of resonances for residues in and around the sugar-binding 
pocket defined by the X-ray structure (Figures 5C, S4C, and 
S4D). In all cases, several resonances from residues involved 
in sugar binding disappear, e.g., K130 and FI 31, probably due 
to exchange broadening (Palmer, 2004). While the sites that 
experience chemical shift perturbations are very similar between 
WT and H84T, the perturbations are slightly larger for H84T and 
differ in direction, particularly for the recognition loop and when 
binding to pentamannose (Figures 5D, S4C, and S4D). Interest- 
ingly, the pentamanose-induced perturbations at 84, 85, and 86 
tend to diminish the differences at these sites observed in the 
absence of sugar, suggesting that sugar binding stabilizes a 
more similar backbone conformation for these sites (Figures 
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Figure 6. Specific NMR Shifts Correlate with Mitogenicity 

(A) Comparison of the mitogenic activity often types of H84X mutants to WT BanLec. PBLs were treated with iectin for 3 days and tested for mitogenic activity by 
the incorporation of BrdU reported from an ELiSA in reiative iuminescent units (RLUs). A stimuiation index (RLUs of treated/RLUs of untreated) of iess than ten 
(gray iine) is considered non-mitogenic. The type of specific amino-acid substitution at position 84 for each mutant is indicated in each figure. Resuits with WT are 
piotted in blue for each comparison shown. 

(B) Antiviral activity of the same BanLec mutants. The anti-HIV activity of each BanLec variant was determined by its ability to block infection of TZM-bl cells with 
virus pseudotyped with the envelope from the HIV-1 BaL strain. The percentage of relative light unit (RLU) activity with increasing concentrations of lectin is 
plotted for each H84X mutation. 

(C) Comparison of NMR chemical shifts for ten representative H84X mutants of BanLec. ^ ^N-"' H HSQCs of WT BanLec and the different mutants. The color coding 
is as follows: WT BanLec (blue), H84T (yellow), H84G (purple), aromatic mutants H84F, H84W, and H84Y (green), and non-aromatic mutants H84D, H84E, H84K, 
H84Q, and H84L (red). 



S4E and S4F). Overall, these data suggest a greater degree of 
conformational reorganization upon sugar binding in H84T as 
compared to WT and different sugar-binding modes for the 
two proteins, consistent with the X-ray structure. 

Correlation between Y83-H84 Stacking, NMR Chemical 
Shifts, Mitogenicity, and Antiviral Activity 

To further explore the correlation between Y83-H84 stacking, 
BanLec conformation, and biological activity, we systematically 
substituted H84 with amino acids that have different abilities to 
engage in stacking interactions and then examined the conse- 
quence on both NMR spectra and biological activity. A panel 
of ten H84 BanLec mutants was constructed, systematically re- 
placing the imidazole ring with other aromatic structures or with 
ionic, polar, or aliphatic groups, and even a hydrogen atom in 
H84G. These studies employed the version of BanLec without 
the double Y46KA/66D mutation. Replacing H84 with the aro- 
matic residues tryptophan (H84W), tyrosine (H84Y), and phenyl- 
alanine (H84F), which can maintain favorable stacking interac- 
tions, had minimal effects on mitogenicity and anti-HIV activity 
(Figures 6A and 6B). In contrast, replacement of the imidazole 
by ionic, polar, or aliphatic side chains, including substitutions 
by the amino acids lysine (H84K), aspartic acid (H84D), glutamic 
acid (H84E), glutamine (H84Q), and leucine (H84L), resulted in 
the marked loss of both mitogenicity and anti-HIV activity (Fig- 



ures 6A and 6B). Only a single mutation (H84G) in this panel of 
protein variants yielded a reasonably similar, but smaller, drop 
in mitogenicity as did H84T, while preserving some antiviral 
HIV activity. 

NMR spectra of the different mutants yielded excellent overall 
overlap, indicating that they all adopt a similar protein fold. The 
differences relative to WT protein were concentrated in the third 
Greek key (83-87) (Figure 6C). Further analysis of these differ- 
ences yielded an interesting trend for several residues; for A86, 
the resonances observed in all the mutants fall roughly along a 
straight line. Similar behaviors, too, were observed for V87 and 
V88, though the magnitude of the change is smaller and more 
difficult to resolve due to spectral overlap. Furthermore, mutants 
with aromatic residues (H84F, H84W, and H84Y) that can sup- 
port pi-pi stacking between residues 83-84 and that have higher 
mitogenicity and anti-HIV activity have A86 resonance clustering 
upfield along the line as compared to other mutants that disrupt 
pi-pi stacking and have lower mitogenicity and reduced anti-HIV 
activity (Figure 6). Interestingly, in H84G, which exhibits mitoge- 
nicity, A86 also clusters upfield with the other mitogenic mutants 
despite disruption of pi-pi stacking. A simple explanation is that 
BanLec exists in rapid dynamic equilibrium between two states 
and that the mutations differentially shift the relative population 
of the two states, with A86, situated on the opposite side of 
the third Greek key loop relative to 84, acting as a reporter for 
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this equilibrium shift. For V87 and V88, such trends are more diffi- 
cult to discern, but clearly resonances cluster depending on 
whether aromatic or non-aromatic residues are used in the sub- 
stitution. These two residues constitute the back part of the third 
Greek key loop. These results suggest that the mutants with ar- 
omatic residues maintain pi-pi stacking interactions between 
amino acids 83 and 84 (Figure 5F). 

Unlike other mutants, H84T retains antiviral activity despite the 
loss of mitogenicity. Interestingly, in H84T, the A86 resonance 
presents intermediate NMR characteristics between aromatic 
and non-aromatic mutants, whereas V87 and V88 cluster with 
non-aromatic residues. Additionally, in H84T G85 presents a 
very distinct signature, shared only with the H84G, which also re- 
tains some antiviral activity, indicating that the two mutants 
share some unique conformational properties. This suggests 
that the third Greek key loop in the H84T mutant uniquely com- 
bines conformational attributes of aromatic and non-aromatic 
mutants. 

Molecular Basis for Separating Two Activities of BanLec 

Both mitogenicity and antiviral activities of BanLec require asso- 
ciation with N-glycans, and so most mutations that block mito- 
genicity also abolish antiviral activity (Figures 1A and 6). Unlike 
other mutants, H84T retains high antiviral activity, which requires 
the capacity to home in on viral glycoproteins with sufficient af- 
finity. This is achieved despite disrupting pi-pi Y83-H84 stacking, 
which is important for sugar binding, possibly due to compensa- 
tory interactions between the side chain of T84 and the sugar and 
retention of WT-like conformational properties. 

In contrast, mitogenicity requires the ability to cross-link 
cognate binding partners, beyond a simple association. Our 
data suggest that the loss of 83-84 stacking decreases this ca- 
pacity in H84T and other mutants both due to slightly reduced 
sugar binding affinity (and possibly altered sugar binding spec- 
ificity) and also due to disruption of the wall that helps create 
two independent sugar-binding sites, each capable of interact- 
ing with a distinct glycan molecule (Figure 5F, left). Rather, in 
H84T, T84 rotates away from CBS II to interact with sugars in 
CBS I, effectively mixing recognition elements in the two bind- 
ing sites (Figure 5F, right). This more open binding pocket may 
make it more likely for the same glycan molecule to simulta- 
neously interact with the two sugar-binding sites and/or binding 
at one site may engage elements from the second site, result- 
ing in weaker binding affinity for a second glycan molecule 
(Figure 5F, right). This makes it less likely for H84T to simulta- 
neously interact with multiple glycan molecules as required 
for mitogenicity. 

DISCUSSION 

The Sugar Code underlies a key biological route of information 
transfer by which cell-to-cell interactions and cell signaling are 
orchestrated. Indeed, sugars can be considered the third type 
of biological alphabet, along with nucleotides and amino acids 
(Murphy et al., 2013). The receptors for glycans (lectins) are en- 
dowed with the capacity to target distinct counterreceptors by 
their structure and topological mode of presentation (Gabius 
et al., 201 5). In doing so, lectins can play a vital role in regulating 



biological processes, such as cell growth and the immune 
response, and also serve as tools for studying structural aspects 
of glycobiology (Kaltner and Gabius, 2012). 

It has previously been observed that a single sugar unit can act 
as a switch for a complex-type glycan’s 3D structure, thus 
altering its ligand reactivity and subsequent signaling (Gabius 
et al., 2011). In the case of a bacterial lectin, the H57A substitu- 
tion in the cholera toxin B-subunit did not disrupt binding to the 
GM1 ganglioside, but did lead to loss of immunomodulatory ac- 
tivity and the ability to induce apoptosis, with altered loop posi- 
tion and rigidification affecting further cell surface contacts 
(Aman et al., 2001). SNPs occur naturally in the genes of human 
and animal lectins, and these natural sequence changes can 
affect the carbohydrate recognition domain and biological func- 
tion, as seen with a human galactose-binding lectin (Ruiz et al., 
2014). In this latter case, an impact on cell proliferation and 
frans-interactions has been inferred (Ruiz et al., 2014; Zhang 
et al., 2015a). Here, we have demonstrated that two distinct 
properties of a lectin can be separated through rational molecu- 
lar fine-tuning: BanLec can be engineered to essentially lose its 
mitogenicity while retaining very potent antiviral activity. The 
resultant H84T BanLec mutant is a broad-spectrum antiviral 
agent that is highly active against multiple strains of HCV, influ- 
enza, and HIV-1 in tissue culture and in vivo; it will also likely 
prove effective against other clinically important viruses with a 
suitable presentation of mannose on their surfaces. 

Our data suggest that loss of mitogenicity can be achieved by 
disrupting 83-84 stacking and disrupting a wall separating two 
sugar-binding pockets, thus diminishing polyvalent interactions. 
However, doing so while retaining antiviral activity requires a 
specific amino-acid substitution (H84T) that may help retain 
WT conformational properties, as well as possibly form unique 
contacts that can compensate for loss of interactions with the 
83-84 stack. It is possible that these basic design principles 
can be applied and extended to allow rational engineering of 
other lectins for use as antiviral tools and other therapeutic pur- 
poses. The recent demonstration that frans-interactions can be 
strengthened by the insertion of a linker into the homodimer of 
the antiviral galectin-1 (Zhang et al., 2015b) and the work pre- 
sented here encourage such efforts. While the term lectin etymo- 
logically stems from the Latin word “legere,” meaning to pick, 
choose, or select (Boyd, 1954), thus emphasizing the natural 
ability of these proteins to target specific carbohydrates, we 
have shown that lectins can be made yet more selective through 
molecular engineering. Our findings also suggest that custom- 
designed lectins can be employed to tease apart fine mecha- 
nisms of immune activation. In more general terms, this proof- 
of-principle work is likely to inspire the generation of new and 
innovative tools in the quest to delineate the intricacies of the 
Sugar Code. 

EXPERIMENTAL PROCEDURES 

Construction and Mutation of BanLec Expression Vectors and 
Purification of Recombinant BanLec Mutants 

The BanLec cDNA was cloned into a vector, allowing for expression of His- 
tagged protein in E. coli, mutagenesis, and purification over a nickel column 
as described in the Supplemental Experimental Procedures. 
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Assessment of Anti-HIV Activity 

Assays testing the anti-HIV activity of WT and H84T BanLec in PBMCs were 
performed as described previously, measuring p24 for HIV-1 and p27 for 
HIV-2 (Ferir et al., 201 1). For the TZM-bl cell assays, to each well of a white 
96-well plate 100 ^il of a solution containing cells, resuspended at 1 x 10® 
cells/ml in DMEM medium with 25 mM HEPES and 10% FBS, was added. 
The next day, the medium was removed by aspiration and fresh medium 
containing lectin or PBS as a control was added to the plate at a concentration 
2-fold higher than the final concentration. After 30 min of incubation, virus 
diluted with medium was added, and the cells were incubated for 48 hr at 
37°C. After the incubation, 100 |al of medium were removed and replaced 
with 100 |al of ONE-Glo Luciferase reagent (Promega) for determination of 
luciferase expression. 

HCV Experiments 

The anti-HCV activity of BanLec derivatives was determined for different geno- 
typic chimeras in Huh-7.5 cells using bicistronic Gaussia luciferase reporter 
genomes as described in the Supplemental Experimental Procedures. 

Assessment of Anti-Influenza Activity 

The in vitro anti-influenza activity of H84T and its efficacy when administered 
via the intranasal route to female BALB/c mice challenged with influenza 
were assessed as described in the Supplemental Experimental Procedures. 

Hemagglutination Assay and ITC 

Hemagglutination assays conducted using rabbit erythrocytes and ITC were 
carried out as described in the Supplemental Experimental Procedures. 

Assessment of Mitogenic Activity by BrdU Incorporation 

Mitogenic activity was quantified as is described in the legend of Figure 6 and 
further in the Supplemental Experimental Procedures. 

Flow Cytometry to Measure Cellular Activation and Bio-Plex 
Cytokine Assay 

Expression of CD69 was measured by flow cytometry and cytokine production 
following stimulation with lectin by Bio-Plex assay as described in the Supple- 
mental Experimental Procedures. 

Vaginal HIV-1 Transmission 

BLT mice were anesthetized and received 75 |ig of H84T BanLec vaginally in a 
volume of 20 |al. 1 0 min after application of the lectin, the mice were challenged 
vaginally with 1 75,000 TCI U of HIV-1 JR-CSF. Mice were bled weekly and the 
plasma was analyzed for the presence of viral RNA for 6 weeks as described 
previously (Wahl et al., 2012). 

Glycocluster Synthesis and Assays 

Synthesis of the glycoclusters is described in the Supplemental Experimental 
Procedures. The determination of the relative ability of glycoclusters to inhibit 
lectin binding to a matrix presenting a glycoligand, given as the inhibitory 
concentration (1C) at which the spectrophotometrically determined signal 
intensity is reduced by 50% (IC50 value), provides a measure of the engage- 
ment of a lectin in multivalent associations. This value and the sensitivity of 
lectin binding to the surface of cells in culture in the presence of glyco- 
clusters were assayed as described in the Supplemental Experimental 
Procedures. 

NMR Spectroscopy 

All NMR experiments were acquired at 313 K on a 600 MHz spectrometer 
equipped with a triple-resonance cryoprobe. Y46KA/66D BanLec assignment 
was obtained using a classical 3D assignment strategy. For a more detailed 
description, see Supplemental Experimental Procedures. 

Crystallization, Data Collection, and Structure Determination 

Following crystallization, data were obtained by LS-CAT, and structure, in the 
presence or absence of dimannoside, was determined as noted in the Supple- 
mental Experimental Procedures. 



MD Simulations 

MD simulations were conducted as described in the Supplemental Experi- 
mental Procedures. All simulations were conducted using the Amber 12 pack- 
age (Case et al., 2005) with the ff99SB*-ILDN force field (Hornak et al., 2006; 
Lindorff-Larsen et al., 2012). The accelerated MD simulations were set up 
following published protocols (Pierce et al., 2012). 

ACCESSION NUMBERS 

The accession number for the crystal structure reported in this paper is depos- 
ited in PDB: 3RFP. The accession numbers for wild-type BanLec, wild-type in 
complex with dimannoside, H84T BanLec mutant, and H84T BanLec mutant in 
complex with dimannoside, respectively, reported in this paper are deposited 
in PDB: 4PIF, 4PIK, 4PIT, 4PIU. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, and five tables and can be found with this article online at 
http://dx.doi.Org/10.1016/j.cell.2015.09.056. 
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SUMMARY 

The microbial adaptive immune system CRISPR 
mediates defense against foreign genetic elements 
through two classes of RNA-guided nuclease 
effectors. Class 1 effectors utilize multi-protein 
complexes, whereas class 2 effectors rely on sin- 
gle-component effector proteins such as the well- 
characterized Cas9. Here, we report characterization 
of Cpf1, a putative class 2 CRISPR effector. We 
demonstrate that Cpf1 mediates robust DMA interfer- 
ence with features distinct from Cas9. Cpf1 is a single 
RNA-guided endonuclease lacking tracrRNA, and it 
utilizes a T-rich protospacer-adjacent motif. More- 
over, Cpf1 cleaves DNA via a staggered DNA dou- 
ble-stranded break. Out of 16 Cpfl -family proteins, 
we identified two candidate enzymes from Acid- 
aminococcus and Lachnospiraceae, with efficient 
genome-editing activity in human cells. Identifying 
this mechanism of interference broadens our under- 
standing of CRISPR-Cas systems and advances 
their genome editing applications. 

INTRODUCTION 

Almost all archaea and many bacteria achieve adaptive immunity 
through a diverse set of CRISPR-Cas (clustered regularly /nter- 
spaced short palindromic repeats and CRISPR-associated pro- 
teins) systems, each of which consists of a combination of Cas 
effector proteins and CRISPR RNAs (crRNAs) (Makarova et al., 
2011, 2015). The defense activity of the CRISPR-Cas systems 
includes three stages: (1) adaptation, when a complex of Cas 
proteins excises a segment of the target DNA (known as a 

CrossMark 



protospacer) and inserts it into the CRISPR array (where this 
sequence becomes a spacer); (2) expression and processing of 
the precursor CRISPR (pre-cr) RNA resulting in the formation of 
mature crRNAs; and (3) interference, when the effector mod- 
ule— either another Cas protein complex or a single large pro- 
tein— is guided by a crRNA to recognize and cleave target DNA 
(or in some cases, RNA) (Horvath and Barrangou, 2010; Sorek 
et al., 201 3; Barrangou and Marraffini, 201 4). The adaptation stage 
is mediated by the complex of the Casi and Cas2 proteins, which 
are shared by all known CRISPR-Cas systems, and sometimes in- 
volves additional Cas proteins. Diversity is observed at the level of 
processing ofthe pre-crRNA to mature crRNAguides, proceeding 
via either a Cas6-related ribonuclease or a housekeeping RNaselll 
that specifically cleaves double-stranded RNA hybrids of pre- 
crRNA and tracrRNA. Moreover, the effector modules differ sub- 
stantially among the CRISPR-Cas systems (Makarova et al., 
2011, 2015; Charpentier et al., 2015). In the latest classification, 
the diverse CRISPR-Cas systems are divided into two classes ac- 
cording to the configuration of their effector modules: class 1 
CRISPR systems utilize several Cas proteins and the crRNA to 
form an effector complex, whereas class 2 CRISPR systems 
employ a large single-component Cas protein in conjunction 
with crRNAs to mediate interference (Makarova et al., 2015). 

Multiple class 1 CRISPR-Cas systems, which include the type I 
and type III systems, have been identified and functionally charac- 
terized in detail, revealing the complex architecture and dynamics 
of the effector complexes (Brounsetal., 2008; Marraffini and Son- 
theimer, 2008; Hale et al., 2009; Sinkunas et al., 2013; Jackson 
et al., 2014; Mulepati et al., 2014). Several class 2 CRISPR-Cas 
systems have also been identified and experimentally character- 
ized, but they are all type II and employ homologous RNA-guided 
endonucleases of the Cas9 family as effectors (Barrangou et al., 
2007; Garneau et al., 2010; Deltcheva et al., 201 1 ; Sapranauskas 
et al. , 201 1 ; Jineketal., 2012; Gasiunas etal.,2012). Asecond, pu- 
tative class 2 CRISPR system, tentatively assigned to type V, has 
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been recently identified in several bacterial genomes (http://www. 
jcvi.org/cgi-bin/tigrfams/HmmReportPage. cgi?acc=TIGR04330) 
(Schunder et al., 2013; Vestergaard et al., 2014; Makarova et al., 
201 5). The putative type V CRISPR-Cas systems contain a large, 
~1 ,300 amino acid protein called Cpfl (CRISPR from Prevotella 
and Francisella 1). It remains unknown, however, whether Cpfl- 
containing CRISPR loci indeed represent functional CRISPR 
systems. Given the broad applications of Cas9 as a genome- 
engineering tool (Hsu et al., 2014; Jiang and Marraffini, 2015), 
we sought to explore the function of Cpfl -based putative CRISPR 
systems. 

Here, we show that Cpfl -containing CRISPR-Cas loci of 
Francisella novicida U112 encode functional defense systems 
capable of mediating plasmid interference in bacterial cells 
guided by the CRISPR spacers. Unlike Cas9 systems, Cpfl -con- 
taining CRISPR systems have three features. First, Cpfl -associ- 
ated CRISPR arrays are processed into mature crRNAs without 
the requirement of an additional frans-activating crRNA 
(tracrRNA) (Deltcheva et al., 2011; Chylinski et al., 2013). Sec- 
ond, Cpfl -crRNA complexes efficiently cleave target DNA pro- 
ceeded by a short T-rich protospacer-adjacent motif (PAM), in 
contrast to the G-rich PAM following the target DNA for Cas9 
systems. Third, Cpfl introduces a staggered DNA double- 
stranded break with a 4 or 5-nt 5' overhang. 



Figure 1. The Francisella novicida U112 
Cpfl CRISPR Locus Provides Immunity 
against Transformation of Plasmids Con- 
taining Protospacers Flanked by a 5'-TTN 
PAM 

(A) Organization of two CRiSPR ioci found 
in Francisella novicida U112 (NC_008601). The 
domain architectures of FnCas9 and FnCpfl are 
compared. 

(B) Schematic iiiustrating the piasmid depietion 
assay for discovering the PAM position and 
identity. Competent E. coli harboring either the 
heteroiogous FnCpfl iocus piasmid (pFnCpfl) or 
the empty vector controi were transformed with a 
iibrary of piasmids containing the matching pro- 
tospacer flanked by randomized 5' or 3' PAM se- 
quences and seiected with antibiotic to depiete 
piasmids carrying successfuiiy targeted PAM. 
Piasmids from surviving coionies were extracted 
and sequenced to determine depieted PAM 
sequences. 

(C and D) Sequence iogo for the FnCpfl PAM as 
determined by the piasmid depietion assay. Letter 
height at each position is measured by information 
content (C) or frequency (D); error bars show 95% 
Bayesian confidence intervai. 

(E) E. coli harboring pFnCpfl provides robust 
interference against piasmids carrying 5'-TTN 
PAMs (n = 3; error bars represent mean ± SEM). 
See aiso Figure S1 . 



To explore the suitability of Cpfl for 
genome-editing applications, we charac- 
terized the RNA-guided DNA-targeting 
requirements for 16 Cpfl -family proteins 
from diverse bacteria, and we identified 
two Cpfl enzymes from Acidaminococcus sp. BV3L6 and 
Lachnospiraceae bacterium ND2006 that are capable of medi- 
ating robust genome editing in human cells. Collectively, these 
results establish Cpfl as a class 2 CRISPR-Cas system that 
includes an effective single RNA-guided endonuclease with 
distinct properties that has the potential to substantially advance 
our ability to manipulate eukaryotic genomes. 

RESULTS 

Cpfl -Containing CRISPR Loci Are Active Bacterial 
Immune Systems 

Cpfl was first annotated as a CRISPR-associated gene in 
TIGRFAM (http://www.jcvi.org/cgi-bin/tigrfams/HmmReportPage. 
cgi?acc=TIGR04330) and has been hypothesized to be the 
effector of a CRISPR locus that is distinct from the Cas9-contain- 
ing type II CRISPR-Cas loci that are also present in the genomes 
of some of the same bacteria, such as multiple strains of 
Francisella and Prevotella (Schunder et al., 2013; Vestergaard 
et al., 2014; Makarova et al., 2015) (Figure 1A). The Cpfl protein 
contains a predicted RuvC-like endonuclease domain that is 
distantly related to the respective nuclease domain of Cas9. 
However, Cpfl differs from Cas9 in that it lacks a second, 
HNH endonuclease domain, which is inserted within the 




PAM 
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RuvC-like domain of Cas9. Furthermore, the N-terminal portion 
of Cpf1 is predicted to adopt a mixed a/p structure and appears 
to be unrelated to the N-terminal, a-helical recognition lobe of 
Cas9 (Figure 1 A). It has been shown that the nuclease moieties 
of Cas9 and Cpf1 are homologous to distinct groups of trans- 
poson-encoded TnpB proteins, the first one containing both 
RuvC and HNH nuclease domains and the second one contain- 
ing the RuvC-like domain only (Makarova and Koonin, 2015). 
Apart from these distinctions between the effector proteins, 
the Cpfl -carrying loci encode Cast, Cas2, and Cas4 proteins 
that are more closely related to orthologs from types I and III 
than to those from type II CRISPR systems (Makarova et al., 
2015). Taken together, these differences from type II have 
prompted the classification of Cpfl -encoding CRISPR-Cas loci 
as the putative type V within class 2 (Makarova et al., 2015). 
The features of the putative type V loci, especially the domain ar- 
chitecture of Cpfl , suggest not only that type II and type V sys- 
tems independently evolved through the association of different 
adaptation modules (cas1 , cas2, and cas4 genes) with different 
TnpB genes, but also that type V systems are functionally 
unique. The notion that Cpfl -carrying loci are bona fide CRISPR 
systems is further buttressed by the search of microbial genome 
sequences for similarity to the type V spacers that produced 
several significant hits to prophage genes— in particular, those 
from Francisella (Schunder et al., 2013). Given these observa- 
tions and the prevalence of Cpfl -family proteins in diverse 
bacterial species, we sought to test the hypothesis that Cpfl -en- 
coding CRISPR-Cas loci are biologically active and can mediate 
targeted DNA interference, one of the primary functions of 
CRISPR systems. 

To simplify experimentation, we cloned the Francisella 
novicida U112 Cpfl (FnCpfl) locus (Figure 1A) into low-copy 
plasmids (pFnCpfl) to allow heterologous reconstitution in 
Escherichia coll. Typically, in currently characterized CRISPR- 
Cas systems, there are two requirements for DNA interference: 
(1) the target sequence has to match one of the spacers present 
in the respective CRISPR array, and (2) the target sequence 
complementary to the spacer (hereinafter protospacer) has to 
be flanked by the appropriate protospacer adjacent motif 
(PAM). Given the completely uncharacterized functionality of 
the FnCpfl CRISPR locus, we adapted a previously described 
plasmid depletion assay (Jiang et al., 2013) to ascertain the ac- 
tivity of Cpfl and identify the requirement for a PAM sequence 
and its respective location relative to the protospacer (5' or 3') 
(Figure IB). We constructed two libraries of plasmids carrying 
a protospacer matching the first spacer in the FnCpfl CRISPR 
array with the 5' or 3' 7 bp sequences randomized. Each plasmid 
library was transformed into E. coll that heterologously ex- 
pressed the FnCpfl locus or into a control E. coll strain carrying 
the empty vector. Using this assay, we determined the PAM 
sequence and location by identifying nucleotide motifs that are 
preferentially depleted in cells heterologously expressing the 
FnCpfl locus. We found that the PAM for FnCpfl is located up- 
stream of the 5' end of the displaced strand of the protospacer 
and has the sequence 5'-TTN (Figures 1C, ID and SI). The 5' 
location of the PAM is also observed in type I CRISPR systems, 
but not in type II systems, where Cas9 employs PAM sequences 
that are located on the 3' end of the protospacer (Mojica et al.. 



2009; Garneau et al., 2010). Beyond the identification of the 
PAM, the results of the depletion assay clearly indicate that het- 
erologously expressed Cpfl loci are capable of efficient interfer- 
ence with plasmid DNA. 

To further characterize the PAM requirements, we analyzed 
plasmid interference activity by transforming cpf7 -locus-express- 
ing cells with plasmids carrying protospacer 1 flanked by 5'-TTN 
PAMs. We found that all 5'-TTN PAMs were efficiently targeted 
(Figure 1 E). In addition, 5'-CTA, but not 5'-TCA, was also efficiently 
targeted (Figure 1 E), suggesting that the middle T is more critical 
for PAM recognition than the first T and that, in agreement with 
the sequence motifs depleted in the PAM discovery assay (Fig- 
ure SI D), the PAM might be more relaxed than 5'-TTN. 

The Cpfl -Associated CRISPR Array Is Processed 
Independent of TracrRNA 

After showing that cpf7 -based CRISPR loci are able to mediate 
robust DNA interference, we performed small RNA sequencing 
to determine the exact identity of the crRNA produced by these 
loci. By sequencing small RNAs extracted from a Francisella 
novicida U112 culture, we found that the CRISPR array is pro- 
cessed into short mature crRNAs of 42-44 nt in length. Each 
mature crRNA begins with 19 nt of the direct repeat followed 
by 23-25 nt of the spacer sequence (Figure 2A). This crRNA 
arrangement contrasts with that of type II CRISPR-Cas systems 
in which the mature crRNA starts with 20-24 nt of spacer 
sequence followed by ~22 nt of direct repeat (Deltcheva et al., 
2011; Chylinski et al., 2013). Unexpectedly, apart from the 
crRNAs, we did not observe any robustly expressed small tran- 
scripts near the Francisella cpf1 locus that might correspond to 
tracrRNAs, which are associated with Cas9-based systems. 

To confirm that no additional RNAs are required for crRNA 
maturation and DNA interference, we constructed an expression 
plasmid using synthetic promoters to drive the expression of 
Francisella cpf1 (FnCpfl) and the CRISPR array (pFnCpf1_min). 
Small RNaseq of E. coll expressing this plasmid still showed 
robust processing of the CRISPR array into mature crRNA (Fig- 
ure 2B), indicating that FnCpfl and its CRISPR array are the 
only elements required from the FnCpfl locus to achieve crRNA 
processing. Furthermore, E. coll expressing pFnCpf1_min as 
well as pFnCpf1_ACas, a plasmid with all of the cas genes 
removed but retaining native promoters driving the expression 
of FnCpfl and the CRISPR array, also exhibited robust DNA 
interference, demonstrating that FnCpfl and crRNA are suffi- 
cient for mediating DNA targeting (Figure 2C). By contrast, 
Cas9 requires both crRNA and tracrRNA to mediate targeted 
DNA interference (Deltcheva et al., 201 1 ; Zhang et al., 2013). 

Cpfl Is a Single crRNA-Guided Endonuclease 

The finding that FnCpfl can mediate DNA interference with 
crRNA alone is highly surprising given that Cas9 recognizes 
crRNA through the duplex structure between crRNA and 
tracrRNA (Jinek et al., 2012; Nishimasu et al., 2014), as well as 
the 3' secondary structure of the tracrRNA (Flsu et al., 2013; 
Nishimasu et al., 201 4). To ensure that crRNA is indeed sufficient 
for forming an active complex with FnCpfl and mediating RNA- 
guided DNA cleavage, we investigated whether FnCpfl supplied 
only with crRNA can cleave target DNA in vitro. We purified 
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Figure 2. Heterologous Expression of 
FnCpfl and CRISPR Array in E. coli Is Suffi- 
cient to Mediate Plasmid DMA Interference 
and crRNA Maturation 

(A) Small RNA-seq of Francisella novicida U112 
reveals transcription and processing of the FnCpfl 
CRiSPR array. The mature crRNA begins with a 
19-nt partiai direct repeat foiiowed by 23-25 nt of 
spacer sequence. 

(B) Smaii RNA-seq of E. coli transformed with a 
piasmid-carrying synthetic promoter-driven 
FnCpfl and CRiSPR array shows crRNA pro- 
cessing independent of Cas genes and other 
sequence eiements in the FnCpfl iocus. 

(C) E. coli harboring different truncations of the 
FnCpfl CRISPR iocus shows that oniy FnCpfl and 
the CRISPR array are required for piasmid DNA 
interference (n = 3; error bars show mean ± SEM). 
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FnCpfl (Figure S2) and assayed its ability to cleave the same 
protospacer-1 -containing plasmid used in the bacterial DNA 
interference experiments (Figure 3A). We found that FnCpfl 
along with an in-vitro-transcribed mature crRNA-targeting proto- 
spacer 1 was able to efficiently cleave the target plasmid in a 
Mg^"^- and crRNA-dependent manner (Figure 3B). Moreover, 
FnCpfl was able to cleave both supercoiled and linear target 
DNA (Figure 3C). These results clearly demonstrate the suffi- 
ciency of FnCpfl and crRNA for RNA-guided DNA cleavage. 

We also mapped the cleavage site of FnCpfl using Sanger 
sequencing of the cleaved DNA ends. We found that FnCpfl - 
mediated cleavage results in a 5-nt 5' overhang (Figures 3A, 
3D, and S3A-S3D), which is different from the blunt cleavage 
product generated by Cas9 (Garneau et al., 2010; Jinek et al., 
2012; Gasiunas et al., 2012). The staggered cleavage site of 
FnCpfl is distant from the PAM: cleavage occurs after the 18^*^ 
base on the non-targeted (+) strand and after the 23*''^ base on 
the targeted (-) strand (Figures 3A, 3D, and S3A-S3D). Using 
double-stranded oligo substrates with different PAM sequences. 



we also found that FnCpfl requires the 5'- 
TTN PAM to be in a duplex form in order 
to cleave the target DNA (Figure 3E). 

The RuvC-like Domain of Cpfl 
Mediates RNA-Guided DNA 
Cleavage 

The RuvC-like domain of Cpfl retains all 
of the catalytic residues of this family of 
endonucleases (Figures 4A and S4) and 
is thus predicted to be an active nuclease. 
Therefore, we generated three mutants— 
FnCpfl (D917A), FnCpfl (El 006A), and 
FnCpfl (D1225A) (Figure 4A)— to test 
whether the conserved catalytic residues 
are essential for the nuclease activity of 
FnCpfl. We found that the D917A and 
El 006A mutations completely inactivated 
the DNA cleavage activity of FnCpfl , and 
D1255A significantly reduced nucleolytic 
activity (Figure 4B). These results are in contrast to the mutagen- 
esis results for Streptococcus pyogenes Cas9 (SpCas9), where 
mutation of the RuvC (D10A) and HNH (N863A) nuclease do- 
mains converts SpCas9 into a DNA nickase (i.e., inactivation of 
each of the two nuclease domains abolished the cleavage of 
one of the DNA strands) (Jinek et al., 2012; Gasiunas et al., 
2012) (Figure 4B). These findings suggest that the RuvC-like 
domain of FnCpfl cleaves both strands of the target DNA, 
perhaps in a dimeric configuration. Interestingly, size-exclusion 
gel filtration of FnCpfl shows that the protein is eluted at a size 
of ~300 kD, twice the molecular weight of a FnCpfl monomer 
(Figure S2B). 

Sequence and Structural Requirements for the 
Cpfl crRNA 

Compared with the guide RNA for Cas9, which has elaborate 
RNA secondary structure features that interact with Cas9 (Nish- 
imasu et al., 2014), the guide RNA for FnCpfl is notably simpler 
and only consists of a single stem loop in the direct repeat 
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sequence (Figure 3A). We explored the sequence and structural 
requirements of crRNA for mediating DNA cleavage with FnCpfl . 

We first examined the length requirement for the guide 
sequence and found that FnCpfl requires at least 16 nt of guide 
sequence to achieve detectable DNA cleavage and a minimum 
of 18 nt of guide sequence to achieve efficient DNA cleavage 
in vitro (Figure 5A). These requirements are similar to those 
demonstrated for SpCasQ, in which a minimum of 16-17 nt of 
spacer sequence is required for DNA cleavage (Cencic et al., 
2014; Fu et al., 2014). We also found that the seed region of 
the FnCpfl guide RNA is approximately within the first 5 nt on 
the 5' end of the spacer sequence (Figures 5B and S3E). 

Next, we studied the effect of direct repeat mutations on the 
RNA-guided DNA cleavage activity. The direct repeat portion 
of mature crRNA is 1 9 nt long (Figure 2A). T runcation of the direct 
repeat revealed that at least 1 6, but optimally more than 1 7 nt, of 
the direct repeat is required for cleavage. Mutations in the stem 
loop that preserved the RNA duplex did not affect the cleavage 
activity, whereas mutations that disrupted the stem loop duplex 
structure completely abolished cleavage (Figure 5D). Finally, 
base substitutions in the loop region did not affect nuclease 
activity, whereas the uracil base immediately proceeding the 
spacer sequence could not be substituted (Figure 5E). Collec- 
tively, these results suggest that FnCpfl recognizes the crRNA 
through a combination of sequence-specific and structural fea- 
tures of the stem loop. 

Cpfl -Family Proteins from Diverse Bacteria Share 
Common crRNA Structures and PAMs 

Based on our previous experience in harnessing Cas9 for 
genome editing in mammalian cells, only a small fraction of bac- 
terial nucleases can function efficiently when heterologously ex- 
pressed in mammalian cells (Cong et al., 2013; Ran et al., 2015). 



Figure 3. FnCpfl Is Guided by crRNA to 
Cleave DNA In Vitro 

(A) Schematic of the FnCpfl crRNA-DNA-targeting 
complex. Cleavage sites are indicated by red 
arrows. 

(B) FnCpfl and crRNA alone mediated RNA- 
guided cleavage of target DNA in a crRNA- and 
Mg^^-dependent manner. 

(C) FnCpfl cleaves both linear and supercoiled 
DNA. 

(D) Sanger-sequencing traces from FnCpfl - 
digested target show staggered overhangs. The 
non-templated addition of an additional adenine, 
denoted as N, is an artifact of the polymerase used 
in sequencing (Clark, 1988). Reverse primer read 
represented as reverse complement to aid visual- 
ization. See also Figure S3. 

(E) Dependency of cleavage on base-pairing at the 
5' PAM. FnCpfl can only recognize the PAM in 
correctly Watson-Crick-paired DNA. 

See also Figures S2 and S3. 
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Therefore, in order to assess the feasi- 
bility of harnessing Cpfl as a genome- 
editing tool, we exploited the diversity of 
Cpfl -family proteins available in the pub- 
lic sequences databases. A BLAST search of the WGS database 
at the NCBI revealed 46 non-redundant Cpfl -family proteins 
(Figure S5A), from which we chose 16 candidates that, based 
on our phylogenetic reconstruction (Figure S5A), represented 
the entire Cpfl diversity (Figures 6A and S5). These Cpfl -family 
proteins span a range of lengths between ~1 ,200 and ~1 ,500 
amino acids. 

The direct repeat sequences for each of these Cpfl -family 
proteins show strong conservation in the 19 nt at the 3' of the 
direct repeat, the portion of the repeat that is included in the 
processed crRNA (Figure 6B). The 5' sequence of the direct 
repeat is much more diverse. Of the 1 6 Cpfl -family proteins cho- 
sen for analysis, three (2, Lachnospiraceae bacterium MC2017, 
Lb3Cpf1; 3, Butyrivibrio proteoclasticus, BpCpfl; and 6, 
Smithella sp. SC_K08D17, SsCpfl) were associated with direct 
repeat sequences that are notably divergent from the FnCpfl 
direct repeat (Figure 6B). However, even these direct repeat 
sequences preserved stem-loop structures that were identical 
or nearly identical to the FnCpfl direct repeat (Figure 6C). 

Given the strong structural conservation of the direct repeats 
that are associated with many of the Cpfl -family proteins, we 
first tested whether the orthologous direct repeat sequences 
are able to support FnCpfl nuclease activity in vitro. As ex- 
pected, the direct repeats that contained conserved stem 
sequences were able to function interchangeably with FnCpfl . 
By contrast, the direct repeats from candidates 2 (Lb3Cpf1) 
and 6 (SsCpfl) were unable to support FnCpfl cleavage activity 
(Figure 6D). The direct repeat from candidate 3 (BpCpfl) sup- 
ported only a low level of FnCpfl nuclease activity (Figure 6D), 
possibly due to the conservation of the 3'-most U. 

Next, we applied the in vitro PAM identification assay (Fig- 
ure S6A) to determine the PAM sequence for each Cpfl -family 
protein. We were able to identify the PAM sequence for seven 
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Figure 4. Catalytic Residues in the C-Terminal RuvC Domain of 
FnCpfl Are Required for DNA Cleavage 

(A) Domain structure of FnCpfl with RuvC catalytic residues highlighted. The 
catalytic residues were identified based on sequence homology to Thermus 
thermophilus RuvC (PDB: 4EP5). 

(B) Native TBE PAGE gel showing that mutation of the RuvC catalytic residues 
of FnCpfl (D917A and E1006A) and mutation of the RuvC (D10A) catalytic 
residue of SpCas9 prevents double-stranded DNA cleavage. Denaturing TBE- 
Urea PAGE gel showing that mutation of the RuvC catalytic residues of FnCpfl 
(D917A and E1006A) prevents DNA-nicking activity, whereas mutation of the 
RuvC (D10A) catalytic residue of SpCas9 results in nicking of the target site. 
See also Figure S4. 



new Cpfl -family proteins (Figures 6E, S6B, and S6C), and the 
screen confirmed the PAM for FnCpfl as 5'-TTN. The remaining 
eight tested Cpfl proteins did not show efficient cleavage during 
in vitro reconstitution. The PAM sequences for the Cpfl -family 
proteins were predominantly T rich, only varying in the number 
of Ts constituting each PAM (Figures 6E, S6B, and S6C). 

Cpfl Can Be Harnessed to Facilitate Genome Editing in 
Human Cells 

We tested each Cpfl -family protein for which we were able to 
identify a PAM for nuclease activity in mammalian cells. We 
codon optimized each of these genes and attached a C-terminal 
nuclear localization signal (NLS) for optimal expression and nu- 
clear targeting in human cells (Figure 7A). To test the activity of 
each Cpfl -family protein, we selected a guide RNA target site 
within the DNMT1 gene (Figure 7B). We first found that each of 
the Cpfl -family proteins along with its respective crRNA de- 



signed to target DNMT1 was able to cleave a PCR amplicon 
of the DNMT1 genomic region in vitro (Figure 7C). However, 
when tested in human embryonic kidney 293FT (HEK293FT) 
cells, only two out of the eight Cpfl -family proteins (7, AsCpfl 
and 1 3, LbCpfl ) exhibited detectable levels of nuclease-induced 
indels (Figures 7C and 7D). This result is consistent with previous 
experiments with Cas9 in which only a small number of Cas9 
orthologs were successfully harnessed for genome editing in 
mammalian cells (Ran et al., 2015). 

We further tested each Cpfl -family protein with additional 
genomic targets and found that AsCpfl and LbCpfl consistently 
mediated robust genome editing in HEK293FT cells, whereas the 
remaining Cpfl proteins showed either no detectable activity or 
only sporadic activity (Figures 7E and S7) despite robust expres- 
sion (Figure S6D). The only Cpfl candidate that expressed 
poorly was PdCpfl (Figure S6D). When compared to Cas9, 
AsCpfl and LbCpfl mediated comparable levels of indel forma- 
tion (Figure 7E). Additionally, we used in vitro cleavage followed 
by Sanger sequencing of the cleaved DNA ends and found that 
7, AsCpfl and 13, LbCpfl also generated staggered cleavage 
sites (Figures S6E and S6F, respectively). 

DISCUSSION 

In this work, we characterize Cpfl -containing class 2 CRISPR 
systems, classified as type V, and show that its effector protein, 
Cpfl , is a single RNA-guided endonuclease. Cpfl substantially 
differs from Cas9— to date, the only other experimentally charac- 
terized class 2 effector— in terms of structure and function and 
might provide important advantages for genome-editing appli- 
cations. Specifically, Cpfl contains a single identified nuclease 
domain, in contrast to the two nuclease domains present in 
Cas9. The results presented here show that, in FnCpfl , inactiva- 
tion of RuvC-like domain abolishes cleavage of both DNA 
strands. Conceivably, FnCpfl forms a homodimer (Figure S2B), 
with the RuvC-like domains of each of the two subunits cleaving 
one DNA strand. However, we cannot rule out that FnCpfl con- 
tains a second yet-to-be-identified nuclease domain. Structural 
characterization of Cpfl-RNA-DNA complexes will allow testing 
of these hypotheses and elucidation of the cleavage mechanism. 

Perhaps the most notable feature of Cpfl is that it is a single 
crRNA-guided endonuclease. Unlike Cas9, which requires 
tracrRNA to process crRNA arrays and both crRNA and 
tracrRNA to mediate interference (Deltcheva et al., 2011), Cpfl 
processes crRNA arrays independent of tracrRNA, and Cpfl - 
crRNA complexes alone cleave target DNA molecules, without 
the requirement for any additional RNA species. This feature 
could simplify the design and delivery of genome-editing tools. 
For example, the shorter (~42 nt) crRNA employed by Cpfl 
has practical advantages over the long (~100 nt) guide RNA in 
Cas9-based systems because shorter RNA oligos are signifi- 
cantly easier and cheaper to synthesize. In addition, these find- 
ings raise more fundamental questions regarding the guide 
processing mechanism of the type V CRISPR-Cas systems. In 
the case of type II, processing of the pre-crRNA is catalyzed 
by the bacterial RNase III, which recognizes the long duplex 
formed by the tracrRNA and the complementary portion of the 
direct repeat (Deltcheva et al., 2011). Such long duplexes 
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Figure 5. crRNA Requirements for FnCpfl Nuclease Activity In Vitro 

(A) Effect of spacer length on FnCpfl cleavage activity. 

(B) Effect of crRNA-target DNA mismatch on FnCpfl cleavage activity. See also Figure S3E. 

(C) Effect of direct repeat length on FnCpfl cleavage activity. 

(D) FnCpfl cleavage activity depends on secondary structure in the stem of the direct repeat RNA structure. 

(E) FnCpfl cleavage activity is unaffected by loop mutations but is sensitive to mutation in the 3'-most base of the direct repeat. 
See also Figure S4. 



are not present in the pre-crRNA of type V systems, making it 
unlikely that RNase III is responsible for processing. Further ex- 
periments aimed at elucidating the processing mechanism of 
type V systems will shed light on the functional diversity of 
different CRISPR-Cas systems. 

Cpf1 generates a staggered cut with a 5' overhang, in contrast 
to the blunt ends generated by Cas9 (Garneau et al., 201 0; Jinek 
et al., 201 2; Gasiunas et al., 201 2). This structure of the cleavage 
product could be particularly advantageous for facilitating non- 



homologous end joining (NHEJ)-based gene insertion into the 
mammalian genome (Maresca et al., 2013). Being able to pro- 
gram the exact sequence of a sticky end would allow re- 
searchers to design the DNA insert so that it integrates into the 
genome in the proper orientation. Specifically, in non-dividing 
cells, in which genome editing via homology-directed repair 
(HDR) mechanisms is especially challenging (Chan et al., 
2011), Cpfl could provide an effective way to precisely introduce 
DNA into the genome via non-HDR mechanisms. 
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Another potentially useful feature of Cpf1 that might aid the 
introduction of new DNA sequences is that Cpf1 cleaves target 
DNA at the distal end of the protospacer, far away from the 
seed region. Therefore, Cpf1 -induced indels will be located far 
from the target site, which is thus preserved for subsequent 
rounds of Cpf1 cleavage. With Cas9, any indel resulting from 
the dominant NHEJ repair pathway will disrupt the target site, 
effectively eliminating the possibility of inserting new DNA at 
that site in that particular cell. In the case of Cpf1, it appears 
possible that, if the first round of targeting results in an indel, a 
subsequent round of targeting could yet be repaired via HDR. 
Future exploration of these and other strategies using Cpf1 
and other class 2 effectors is expected to bring solutions for 
some of the biggest challenges facing genome editing. 

The T-rich PAMs of the Cpf1 -family also allow for applications 
in genome editing in organisms with particularly AT-rich ge- 
nomes, such as Plasmodium falciparum (Gardner et al., 2002) 
or areas of interest with AT enrichment, such as scaffold/matrix 
attachment regions. To date, all characterized mammalian 
genome-editing proteins require the presence of at least one G 
(Hsu et al., 201 4; Jiang et al., 201 5), so the T- and T/C-dependent 
PAMs of Cpfl -family proteins expand the targeting range of 
RNA-guided genome editing nucleases. 

The natural diversity of CRISPR systems provides a wealth of 
opportunities for understanding the origin and evolution of pro- 
karyotic adaptive immunity, as well as for harnessing potentially 
transformative biotechnological tools. There is little doubt that, 
beyond the already classified and characterized diversity of the 
CRISPR-Cas types, there are additional systems with distinctive 
characteristics that await exploration and could further enhance 
genome editing and other areas of biotechnology as well as shed 
further light on the evolution of these defense systems. 

EXPERtMENTAL PROCEDURES 

Generation of Heterologous Plasmids 

To generate the FnCpfl locus for heterologous expression, genomic DNA from 
Francisella novicida (generous gift from Wayne Conlan) was PGR amplified 
using Herculase II polymerase (Agilent Technologies) and cloned into 
pACYC-184 using Gibson cloning (New England Biolabs). Cells harboring 
plasmids were made competent using the Z-competent kit (Zymo). Sequences 
of all bacterial expression plasmids can be found in Table S1. 

Bacterial RNA Sequencing 

RNA was isolated from stationary-phase bacteria by first resuspending 
F. novicida (generous gift from David Weiss) or E. coii in TRIzol and then ho- 
mogenizing the bacteria with zirconia/silica beads (BioSpec Products) in a 
BeadBeater (BioSpec Products) for three 1 -min cycles. Total RNA was purified 
from homogenized samples with the Direct-Zol RNA miniprep protocol (Zymo), 



DNase treated with TURBO DNase (Life Technologies), and 3' dephosphory- 
lated with T4 Polynucleotide Kinase (New England Biolabs). rRNA was 
removed with the bacterial Ribo-Zero rRNA removal kit (lllumina). RNA libraries 
were prepared from rRNA-depleted RNA using NEBNext Small RNA Library 
Prep Set for lllumina (New England Biolabs) and size selected using the Pippin 
Prep (Sage Science). 

For heterologous E. coii expression of the FnCpfl locus, RNA-sequencing 
libraries were prepared from rRNA-depleted RNA using a derivative of the pre- 
viously described CRISPR RNA-sequencing method (Heidrich et al., 2015). In 
brief, transcripts were poly-A tailed with E. coii Poly(A) Polymerase (New 
England Biolabs), ligated with 5' RNA adapters using T4 RNA Ligase 1 (ssRNA 
Ligase) High Concentration (New England Biolabs), and reverse transcribed 
with AffinityScript Multiple Temperature Reverse Transcriptase (Agilent Tech- 
nologies). cDNA was PGR amplified with barcoded primers using Herculase II 
polymerase (Agilent Technologies). 

RNA-Sequencing Analysis 

The prepared cDNA libraries were sequenced on a MiSeq (lllumina). Reads 
from each sample were identified on the basis of their associated barcode 
and aligned to the appropriate RefSeq reference genome using BWA (Li and 
Durbin, 2009). Paired-end alignments were used to extract entire transcript se- 
quences using Picard tools (http://broadinstitute.github.io/picard), and these 
sequences were analyzed using Geneious 8.1.5 (Biomatters). 

In Vivo FnCpfl PAM Screen 

Randomized PAM plasmid libraries were constructed using synthesized oligo- 
nucleotides (IDT) consisting of eight or seven randomized nucleotides either 
upstream or downstream, respectively, of the FnCpfl spacer 1 . The random- 
ized ssDNA oligos (Table SI) were made double stranded by annealing to a 
short primer and using the large Klenow fragment (New England Biolabs) for 
second-strand synthesis. The dsDNA product was assembled into a linearized 
pUC19 using Gibson cloning (New England Biolabs). Competent Stbl3 E. coii 
(Invitrogen) were transformed with the cloned products, and >10^ cells were 
collected and pooled. Plasmid DNA was harvested using a Maxi-prep kit 
(QIAGEN). We transformed 30 ng of the pooled library into E. coii cells carrying 
the FnCpfl locus or pACYC184 control. After transformation, cells were plated 
on ampicillin. After 16 hr of growth, >4E6 cells were harvested and plasmid 
DNA was extracted using a Maxi-prep kit (QIAGEN). The target PAM region 
was amplified and sequenced using a MiSeq (lllumina) with a single-end 
150 cycle kit. 

Computational PAM Discovery Pipeline 

PAM regions were extracted, counted, and normalized to total reads for each 
sample. For a given PAM, enrichment was measured as the log ratio compared 
to pACYCI 84 control, with a 0.01 psuedocount adjustment. PAMs above a 3.5 
enrichment threshold were collected and used to generate sequence logos 
(Crooks et al., 2004). 

PAM Validation 

Sequences corresponding to both PAMs and non-PAMs were cloned into di- 
gested pUC19 and ligated with T4 ligase (Enzymatics). Competent E. co// with 
either the FnCpfl locus plasmid or pACYC184 control plasmid were trans- 
formed with 20 ng of PAM plasmid and plated on LB agar plates supplemented 
with ampicillin and chloramphenicol. Colonies were counted after 18 hr. 



Figure 6. Analysis of Cpfl -Family Protein Diversity and Function 

(A) Phylogenetic tree of 16 Cpfl orthologs selected for functional analysis. Conserved sequences are shown in dark gray. The RuvC domain, helical region, and 
zinc finger are highlighted. 

(B) Alignment of direct repeats from the 1 6 Cpfl -family proteins. Sequences that are removed post crRNA maturation are colored gray. Non-conserved bases are 
colored red. The stem duplex is highlighted in gray. 

(C) RNAfold (Lorenz et al., 2011) prediction of the direct repeat sequence in the mature crRNA. Predictions for FnCpfl along with three diverged type V loci are 
shown. 

(D) Type V crRNAs from different bacteria with similar direct repeat sequences are able to function with FnCpfl to mediate target DNA cleavage. 

(E) PAM sequences for eight Cpfl -family proteins identified using in vitro cleavage of a plasmid library containing randomized PAMs flanking the protospacer. 
See also Figures S5 and S6. 
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Figure 7. Cpf1 Mediates Robust Genome Editing in Human Cell Lines 

(A) Eight Cpf1 -family proteins were individually expressed in HEK293FT cells using CMV-driven expression vectors. The corresponding crRNA was expressed 
using a PGR fragment containing a U6 promoter fused to the crRNA sequence. Transfected cells were analyzed using either Surveyor nuclease assay or targeted 
deep sequencing. 

(B) Schematic showing the sequence of DNMT1 -targeting crRNA 3. Sequencing reads show representative indels. 

(C) Comparison of in vitro and in vivo cleavage activity. The DNMT 1 target region was PGR amplified, and the genomic fragment was used to test Cpf1 -mediated 
cleavage. All eight Cpf1 -family proteins showed DNA cleavage in vitro (top), but only candidates 7, AsCpfl and 13, Lb3Cpf1 facilitated robust indel formation in 
human cells. 

(D) Cpfl and SpCas9 target sequences in the human DNMT1 and EMX1 loci. 

(E) Comparison of Cpfl and SpCas9 genome-editing efficiency. Target sites correspond to sequences shown in Figure 7D. 

See also Figure S7. 



Synthesis of crRNAs and sgRNAs 

All crRNAs and sgRNAs used in biochemical reactions were synthesized using 
the HiScribe T7 High Yield RNA Synthesis Kit (NEB). ssDNA oligos (Table S2) 
corresponding to the reverse complement of the target RNA sequence 
were synthesized from IDT and annealed to a short T7 priming sequence. T7 
transcription was performed for 4 hr, and then RNA was purified using the 
MEGAclear Transcription Clean-Up Kit (Ambion). 

Purification of Cpf1 Protein 

FnCpfl protein was cloned into a bacterial expression vector (6-His-MBP- 
TEV-Cpfl , a pET based vector generously provided by Doug Daniels). Two 
liters of Terrific Broth growth media with 100 i^g/ml ampicillin were inoculated 
with 10 ml overnight culture Rosetta (DE3) pLyseS (EMD Millipore) cells con- 
taining the Cpfl expression construct. Growth media plus inoculant was 



grown at 37°C until the cell density reached 0.2 OD600, then the temperature 
was decreased to 21 °C. Growth was continued until OD600 reached 0.6 when 
a final concentration of 500 laM IPTG was added to induce MBP-Cpfl expres- 
sion. The culture was induced for 1 4-1 8 hr before harvesting cells and freezing 
at -80°C until purification. 

Cell paste was resuspended in 200 ml of Lysis Buffer (50 mM HEPES [pH 
7], 2M NaCI, 5 mM MgCl 2 , 20 mM imidazole) supplemented with protease 
inhibitors (Roche complete, EDTA-free) and lysozyme (Sigma). Once homog- 
enized, cells were lysed by sonication (Branson Sonifier 450) and then centri- 
fuged at 10,000 X g for 1 hr to clear the lysate. The lysate was filtered 
through 0.22 micron filters (Millipore, Stericup) and applied to a nickel col- 
umn (HisTrap FF, 5 ml), washed, and then eluted with a gradient of imidazole. 
Fractions containing protein of the expected size were pooled, TEV protease 
(Sigma) was added, and the sample was dialyzed overnight into TEV buffer 
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(500 mM NaCI, 50 mM HEPES [pH 7], 5 mM MgCI, 2 mM DTT). After dialysis, 
TEV cleavage was confirmed by SDS-PAGE, and the sample was concen- 
trated to 500 |al prior to loading on a gel filtration column (HILoad 16/600 
Superdex 200) via FPLC (AKTA Pure). Fractions from gel filtration were 
analyzed by SDS-PAGE; fractions containing Cpfl were pooled and concen- 
trated to 200 1^1 and either used directly for biochemical assays or frozen at 
-80°C for storage. Gel filtration standards were run on the same column 
equilibrated in 2M NaCI, HEPES (pH 7.0) to calculate the approximate size 
of FnCpfl . 

Generation of Cpf1 Protein Lysate 

Cpfl proteins codon optimized for human expression were synthesized with 
a C-terminal nuclear localization tag and cloned into the pcDNAS.I expression 
plasmid by Genscript (Table SI). 2,000 ng of Cpfl expression plasmids were 
transfected into 6-well plates of HEK293FT cells at 90% confluency using Lip- 
ofectamine 2000 reagent (Life Technologies). 48 hr later, cells were harvested 
by washing once with DPBS (Life Technologies) and scraping in lysis buffer 
(20 mM HEPES [pH 7.5], 100 mM KCI, 5mM MgCl 2 , 1 mM DTT, 5% glycerol, 
0.1% Triton X-100, IX cOmplete Protease Inhibitor Cocktail Tablets [Roche]). 
Lysate was sonicated for 1 0 min in a Biorupter sonicator (Diagenode) and then 
centrifuged. Supernatant was frozen for subsequent use in in vitro cleavage 
assays. 

In Vitro Cleavage Assay 

Cleavage in vitro was performed either with purified protein (25 nM) or 
mammalian lysate with protein at 37°C in cleavage buffer (NEBuffer 3, 5 mM 
DTT) for 20 min. The cleavage reaction used 500 ng of synthesized crRNA or 
sgRNA and 200 ng of target DNA. Target DNA involved either protospacers 
cloned into pUC19 or PCR amplicons of gene regions from genomic DNA iso- 
lated from HEK293 cells. Reactions were cleaned up using PCR purification 
columns (QIAGEN) and were run on 2% agarose E-gels (Life Technologies). 
For native and denaturing gels to analyze cleavage by nuclease mutants, 
cleaned-up reactions were run on TBE 6% polyacrylamide or TBE-Urea 6% 
polyacrylamide gels (Life Technologies). 

In Vitro Cpf1 -Family Protein PAM Screen 

In vitro cleavage reactions with Cpfl -family proteins were run on 2% agarose 
E-gels (Life Technologies). Bands corresponding to un-cleaved target were gel 
extracted using QIAquick Gel Extraction Kit (QIAGEN), and the target PAM re- 
gion was amplified and sequenced using a MiSeq (lllumina) with a single-end 
150 cycle kit. Sequencing results were entered into the PAM discovery 
pipeline. 

Western Blot Analysis 

Cells were lysed in 1 xRIPA buffer (Cell Signaling Technology) supplemented 
with protease inhibitor cocktail (Roche). Equal volumes of cell lysate were 
run on BOLT 4%-12% Bis-Tris gradient gels (Invitrogen) and transferred 
to PVDF membranes (Millipore). Non-specific antigen binding was blocked 
with TBS-T (50 mM Tris, 150 mM NaCI and 0.05% Tween-20) with 5% 
BLOT-QuickBlocker Reagent (Millipore) for 1 hr. Membranes were incubated 
with primary antibodies (anti-HA-tag [Cell Signaling Technology C29F4] 
or HRP-conjugated GAPDH [Cell Signaling Technology 14C10]) for 1 hr in 
TBS-T with 1% BLOT-QuickBlocker. Membranes were washed for three 
10 min washes and anti-HA-tag membranes were further incubated with 
anti-rabbit antibody (Cell Signaling Technology 7074) for 1 hr followed by 
six 10 min washes in TBS-T. Proteins were visualized with West Pico Chemi- 
luminescent Substrate (Life Technology) and imaged using the ChemiDoc 
MP Imaging System (Bio-Rad) and processed with ImageLab software 
(Bio-Rad). 

SURVEYOR Nuclease Assay for Genome Modification 

PCR amplicons comprised of a U6 promoter driving expression of the crRNA 
sequence were generated using Herculase II (Agilent Technologies) and 
appropriate U6 reverse primers (Table S2). 400 ng of Cpfl expression plas- 
mids and 100 ng of the U6::crRNA expression cassettes were transfected 
into 24-well plates of HEK293FT cells at 75%-90% confluency using Lipofect- 
amine 2000 (Life Technologies). 



Cells were incubated at 37°C for 72 hr post-transfection before genomic 
DNA extraction. Genomic DNA was extracted using the QuickExtract DNA 
Extraction Solution (Epicenter) following the manufacturer’s protocol. The 
genomic region flanking the CRISPR target site for each gene was PCR ampli- 
fied, and products were purified using QiaOuick Spin Column (QIAGEN) 
following the manufacturer’s protocol. 200-500 ng total of the purified PCR 
products were mixed with 1 |al 10 x Taq DNA Polymerase PCR buffer (Enzy- 
matics) and ultrapure water to a final volume of 10 ^il and were subjected to 
a re-annealing process to enable heteroduplex formation: 95°C for 10 min, 
95°C to 85°C ramping at -2°C/s, 85°C to 25°C at -0.25°C/s, and 25°C hold 
for 1 min. After re-annealing, products were treated with SURVEYOR nuclease 
and SURVEYOR enhancer S (Integrated DNA Technologies) following the 
manufacturer’s recommended protocol and analyzed on 4%-20% Novex 
TBE polyacrylamide gels (Life Technologies). Gels were stained with 
SYBR Gold DNA stain (Life Technologies) for 10 min and imaged with a Gel 
Doc gel imaging system (Bio-rad). Quantification was based on relative band 
intensities. Indel percentage was determined by the formula, 100 x (1 - 
sqrt(1 - (b + c)/(a + b + c))), where a is the integrated intensity of the undigested 
PCR product, and b and c are the integrated intensities of each cleavage 
product. 

Deep Sequencing to Characterize Cpf 1 Indei Patterns in 293FT Ceiis 

HEK293FT cells were transfected and harvested as described for assessing 
activity of Cpfl cleavage. The genomic-region-flanking DNMT1 targets were 
amplified using a two-round PCR region to add lllumina P5 adapters as well 
as unique sample-specific barcodes to the target amplicons. PCR products 
were run on 2% E-gel (Invitrogen) and gel extracted using QiaQuick Spin Col- 
umn (QIAGEN) as per the manufacturer’s recommended protocol. Samples 
were pooled and quantified by Qubit 2.0 Fluorometer (Life Technologies). 
The prepared cDNA libraries were sequenced on a MiSeq with a single-end 
300 cycle kit (lllumina). Indels were mapped using a Python implementation 
of the Geneious 6.0.3 Read Mapper. 

Computational Analysis of Cpf1 loci 

PSI-BLAST program (Altschul et al., 1997) was used to identify Cpfl homologs 
in the NCBI NR database using several known Cpfl sequences as queries with 
the Cpfl with the E-value cut-off of 0.01 and low-complexity filtering and 
composition-based statistics turned off. The TBLASTN program with the 
E-value cut-off of 0.01 and low-complexity filtering turned off was used to 
search the NCBI WGS database using the Cpfl profile (Makarova et al., 
2015) as the query. Results of all searches were combined (Table S3). The 
HHpred program was used with default parameters (Soding et al., 2006) to 
identify remote sequence similarity using a subset of representative Cpfl 
sequences queries. Multiple sequence alignments were constructed using 
MUSCLE (Edgar, 2004) with manual correction based on pairwise alignments 
obtained using PSI-BLAST and HHpred programs. Phylogenetic analysis was 
performed using the FastTree program with the WAG evolutionary model 
and the discrete gamma model with 20 rate categories (Price et al., 2010). Pro- 
tein secondary structure was predicted using Jpred 4 (Drozdetskiy et al., 
2015). CRISPR repeats were identified using PILER-CR (Edgar, 2007) and 
CRISPRfinder (Grissa et al., 2007). 
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Due to a production error, a label in Figure 1 of this BenchMarks article was incorrect. The DMA element between DAS and PolyA 
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Restriction 

factor 


Virus(es) targeted 


Mechanism(s) of restriction 


Viral antagonist(s) or evasion mechanism(s) 


IFN- 

inducible 


Positive 

selection 




IFITM family 


Retro-, orthomyxo-, flavi-, filo-, corona-, 
rhabdo-, bunya-, reoviruses 


Inhibits membrane fusion, modification of lipid components, or 
membrane fluidity 


None known 


Some 


Some 


1 


SERINC3, 5 


Retroviruses (HIV, SIV, MLV, EIAV) 


Reduces membrane fusion 


Nef (HIV, SIV), Glyco-Gag (MLV), S2 (EIAV) 


N 


N 




CH25H 


Flavi-, retro-, filo-, bunya-, rhabdo-, herpesviruses 


Inhibits membrane fusion by generating 25-hydroxycholesterol 


None known 


Y 


Unknown 




TRIMSa, TRIM-Cyp 


Retroviruses (HIV, SIV, MLV, EIAV) 


Accelerates uncoating, thereby inhibiting reverse transcription 


Capsid mutation 


Y 


Y 




Fv1 


MLV 


Targets the viral capsid protein and interferes with uncoating 


Capsid mutation 


N 


Y 


3 


APOBEC3 family 


Hepadna-, retroviruses 


Induces hypermutation by deamination, inhibits reverse transcription of HIV 
by binding to RNA and suppressing tRNA3Lys priming 


Vif (lentiviruses). Bet (spumaviruses). Gag (gam- 
maretroviruses) 


Some 


Some 




SAMHD1 


Retroviruses 


Hydrolyzes cellular dTNP and degrades viral RNA 


Vpx (HIV-2, some SIV), Vpr (some SIV) 


Y 


Y 




MxA 


IFLUAV THOV 


Inhibits vRNP nuclear import 


Nucleoprotein mutations (pandemic IFLUAV) 


Y 


Y 


4 


MxB 


HIV, SIV 


Prevents integration of proviral DNA by inhibiting uncoating, nuclear uptake, 
and/or integrity/stability of the PIC 


Capsid mutation 


Y 


Y 


5 


KAP1/TRIM28 


HIV-1 


Induces deacetylation of HIV integrase 


None known 


N 


N 




KAP1/TRIM28 


Herpes-, retroviruses 


Silences transcription and induces latency 


vPK (KSHV) 


N 


N 




Viperin/RSAD2 


HCV DENV 


Inhibits formation of the HCV replicon complex by sequestration of hVAP-33 
and interaction with NS5A, interacts with NS3 (DENV) 


None known 


Y 


Y 




CH25H 


HCV 


Inhibits membranous web formation and NS5A dimerization 


None known 


Y 


Unknown 




IFI16 


HPV HCMV HSV1 


Accumulates on the viral genome and prevents association of transcriptional 
activators, induces heterochromatin formation 


pUL97, pUL83 (HCMV) 


Y 


Y 


6 


MxA 


Bunyaviruses (LACV, RVFV, BUNV) 


Sequesters newly synthesized viral N protein into perinuclear complexes 


None known 


Y 


Y 




RNaseL (+OAS1) 


Picorna-, flavi-, toga-, corona-, reo-, pox-, 
orthomyxo-, paramyxo-, herpes-, retro-, rhabdo-, 
hepadna-, polyomaviruses 


Degrades viral (m)RNA, RNaseL is activated by 2’-5’-linked oligoadenylates 
produced by OAS1 


NS1 (IFLUAV), E3L, D9, DIO (VACV), o3 (ReoV), Tat 
(HIV), ns2 (murine hepatitis virus), VP3 (RotaV), L* 
(Theiler’s virus), hairpin RNA structure (poliovirus), 
genome adaptation (HCV) 


Y 


Y 




SAMHD1 


Arteri-, pox-, herpesviruses 


Hydrolyzes cellular dNTP and degrades viral RNA 


None known 


Y 


Y 




APOBEC3 family 


Herpes-, papillomaviruses 


Induces hypermutation by deamination 


None known 


Some 


Some 




PKR 


Herpes-, orthomyxo-, retro-, flavi-, reo-, adeno-, 
poxviruses 


Inhibits mRNA translation by elF2a phosphorylation 


NS1 (IFLUAV), E2, NS5A (HCV), TRS1, IRS1 (HCMV), 
K3L, E3L (VACV), US11 (HSV1), vlRF-2, LANA2 
(KSHV), NSs (RVFV), o3, a4 (ReoV), SM, EBER-1 
(EBV), Tat (HIV), VAI RNAs (AV), C8L, K3L (Swine- 
poxV), Nsp3 (RotaV), y(1)34.5 (HSV-1) 


Y 


Y 


7 


SLFN11 


HIV, other retroviruses 


Inhibits viral protein synthesis by altering tRNA function 


None known 


Y 


Y 




ZAP 


Retro-, filo-, hepadna-, togaviruses 


Recruits RNA exosome complex to degrade viral RNA 


None known 


Y 


Y 




IFIT family 


Flavi-, bunya-, rhabdo-, orthomyxo-, picorna-, 
coronaviruses 


Inhibits cap- and IRES-dependent translation by binding to elF3 (HCV), bind- 
ing and degradation of PPP-RNA (RVFV, VSV, IFLUAV) and RNA lacking 2’-0 
methylation (WNV, JEV) 


2’-0 methylation of viral RNA (WNV, SARS-CoV, 
VACV), hairpin structures near the 5’ ends of viral 
RNA (VEEV), masking of the 5’ end by Vpg (EMCV) 


Y 


Y 


8 


HERC5 (+ISG15) 


HIV, MLV, HPV, IFLUAV 


Inhibits HIV and MLV assembly by ISGylation of Gag, ISGylation of IFLUAV 
NS1 and HPV LI capsid reduces infectious virus yield 


NS1 (?) (IFLUAV) 


Y 


Y 


9 


Tetherin/BST2/ 

CD317 


Retro-, flavi-, filo-, rhabdo-, herpes-, corona-, 
paramyxo-, arena-, toga-, hepadnaviruses 


Prevents virus release by tethering budding progeny virions to the plasma 
membrane of the infected cell 


Vpu (HIV-1 M/N, SIVgsn/mon/mus), Nef (most SIV, 
HIV-1 0), K5 (KSHV), Env (HIV-2, EBOV, MARV, 
SIVagm), Nspi (CHIKV), gM (HSV-1), HA/NA (pan- 
demic IFLUAV), F/HN (SeV), HBs (HBV) 


Y 


Y 



774 Cell 163 , October 22, 2015 ©2015 Elsevier Inc. DOI http://dx.doi.Org/10.1016/j.cell.2015.10.019 



See online version for legend and references. 




Snapshot: Antiviral Restriction Factors 

Silvia F. Kluge, Daniel Sauter, and Frank Kirchhoff 

Institute of Molecular Virology, Dim University Medical Center, 89081 Ulm, Germany 



Restriction factors are ceiiuiar proteins that inhibit virai repiication and represent a first iine of defense against virai pathogens. They show an enormous structurai and functionai 
diversity and target aimost every step of the virai repiication cycie. Aithough there is no unambiguous definition of restriction factors (Doyie et ai., 2015), these proteins frequentiy 
share severai characteristics: they are germ-iine encoded, ceii-intrinsic proteins that can be found in aimost aii ceii types. Whiie their expression is often upreguiated by interferons 
(iFNs), many of them are constitutiveiy expressed, aiiowing them to act very eariy during virai infection. Restriction factors frequentiy target conserved virai components, such 
as the virai genomes or membranes, and may thus be active against diverse virai famiiies. Notabiy, some of them are so-caiied mooniighting proteins, aiso exhibiting bioiogicai 
functions outside of immunity, in some cases, restriction of virai repiication may resuit from a ceii-reguiatory function rather than direct interference with the virai repiication cycie. 
Viruses have evoived sophisticated means to evade or directiy counteract many restriction factors. As a consequence of the continuous arms race with their virai antagonists, 
restriction factors usuaiiy evoive rapidiy and show evoiutionary signatures of adaptation. Sites under positive seiection often directiy interact with virai components, either to target 
them for inhibition or because they are being targeted by virai antagonists. As a consequence of virus-host adaptation, restriction factors are usuaiiy iess effective against viruses 
in their naturai hosts but represent potent barriers against cross-species transmissions. Finaiiy, their specific interaction with virai components aiiows some restriction factors to 
act as pattern recognition receptors that do not oniy directiy inhibit virai pathogens, but aiso sense them to induce antivirai immune responses. 

The term “restriction factor” was estabiished in the eariy 1970s, when researchers discovered that expression of Fvl protects mice against infection by an otherwise iethai 
dose of MLV (Liiiy, 1970). Later, it became evident that primate ientiviruses, such as HiV-1, are subject to simiiar restrictions. A functionai screen for suppressors of HiV-1 identi- 
fied rhesus TRiM5a as a potent inhibitor and determinant of retrovirai species specificity (Stremiau et ai., 2004). Simiiar to Fvl, TRiM5a and the reiated TRiM-CypA protein target 
incoming retrovirai capsids and biock virai repiication by preventing virai cDNA synthesis. Other weii-characterized retrovirai restriction factors inciude APOBEC3G, Tetherin, and 
SAMHD1. APOBEC3G is a cytidine deaminase that is packaged into virai particies and inhibits virai cDNA synthesis by affecting the processivity of reverse transcription and by 
causing inactivating G-to-A hypermutations in the provirai genome (Sheehy et ai., 2002). Tetherin inhibits the reiease of budding progeny virions because one of its two membrane 
anchors is inserted into the virai enveiope whiie the other remains in the ceii membrane (Van Damme et ai., 2008; Neii et ai., 2008). SAMHDI suppresses reverse transcription in 
non-dividing ceiis by depicting dNTPs, which are required for effective cDNA synthesis, and perhaps aiso by degrading virai RNA (Hrecka et ai., 2011; baguette et ai., 2011). With 
the exception of TRiM5a, that is evaded by virai capsid mutations, these restriction factors are aii counteracted by accessory proteins of HiV and reiated ientiviruses: APOBEC3 
proteins by Vif, Tetherin by Vpu of pandemic HiV-1 group M as weii as Nef of many other primate ientiviruses, and SAMHDI by HiV-2 and SiV Vpx or Vpr proteins. Very recentiy, 
SERiNCS and SERiNC3 have been identified as the enigmatic factors that impair the infectivity of HiV and SiV particies and are antagonized by the virai protein Nef (Rosa et ai., 
2015; Usami et ai., 2015). 

Ceiiuiar proteins inhibiting HiV-1 have received enormous research interest, and a variety of additionai antivirai factors, such as iFiTM proteins, CH25H, KAP1/TRiM28, 90K, 
MOV10, MxB, SLFN11, and ZAP have been described. The discovery of aii of these factors has reievance far beyond HiV/AiDS and other retroviruses because many of them have 
broad antivirai activity. For exampie, Tetherin suppresses the reiease of a iarge variety of enveioped viruses, inciuding fiio-, rhabdo-, arena-, and herpesviruses. Simiiariy, iFiTMs 
and CH25H may impair virion infectivity of diverse virus famiiies by aitering the iipid composition of the virai membrane. Another striking exampie of a broadiy active antivirai protein 
is PKR. This kinase inhibits virai mRNA transiation by inhibiting the initiation factor eiF2a. 

The definition of a “reai” restriction factor is intensiveiy debated. Viruses are interacting with and hijacking hundreds of ceiiuiar proteins to ensure efficient virai repiication. Thus, 
overexpression or knockdown of many ceiiuiar factors may resuit in the identification of proteins with putative antivirai effects. Moreover, oniy a minority of the antivirai factors 
described to date show aii features reported to be characteristic for a restriction factor, in fact, antivirai proteins without any (known) virai antagonist or evasion mechanism (e.g., 
iFiTMs and SLFN11) have been proposed to be caiied “resistance factors” (Doyie et ai., 2015). Here, we more broadiy appiy the term “restriction factor” to intrinsic ceiiuiar factors 
known to dispiay antivirai activity. We apoiogize to both the purists who appiy criteria that are more stringent and to aii of the scientists who discovered interesting antivirai factors 
that we did not mention. We are oniy just beginning to understand the enormous diversity of antivirai factors and the highiy sophisticated ways expioited by viruses to antagonize 
or evade them. No matter which definition of a restriction factor we appiy, there wiii certainiy be discoveries of novei antivirai proteins that wiii not satisfy the criteria. 

ABBREVIATIONS 

Antivirai factors: iFiTM, interferon -induced transmembrane protein; SERiNC, serine incorporator; CH25H, choiesteroi 25-hydroxyiase; TRiM, tripartite motif-containing protein; Fvl, 
Friend virus susceptibiiity-1 ; APOBEC3, apoiipoprotein B mRNA-editing enzyme, cataiytic poiypeptide-iike 3; SAMHDI, SAM domain and HD domain-containing protein 1; MxA, 
myxovirus resistance gene A; MxB, myxovirus resistance gene B; KAP1, KRAB-associated protein 1; RSAD2, radicai S-adenosyi methionine domain-containing 2; iFi16, interferon- 
inducibie protein 16; OAS1, 2’-5’-oiigoadenyiate synthetase 1; PKR, (ds)RNA-dependent protein kinase R; SLFN11, Schiafen famiiy member 11; ZAP, zinc-finger antivirai protein; 
iFiT, interferon-induced protein with tetratricopeptide repeats; HERC5, HECT and RLD domain-containing E3 ubiquitin protein iigase 5; iSG15, interferon-stimuiated gene 15; BST2, 
bone marrow stromai ceii antigen 2. 

Viruses: HiV, human immunodeficiency virus; SiV, simian immunodeficiency virus; MLV, murine ieukemia virus; EiAV, equine infectious anemia virus; iFLUAV, infiuenza A virus; THOV, 
Thogoto virus; HCV, hepatitis C virus; DENV, Dengue virus; HPV, human papiiioma virus; HCMV, human cytomegaiovirus; HSV1 , herpes simpiex virus 1 ; LACV, La Crosse encephaiitis 
virus; RVFV, Rift Vaiiey fever virus; BUNV, bunyamweravirus; VSV, vesicuiar stomatitis virus; WNV, West Niie virus; JEV, Japanese encephaiitis virus; KSHV, Kaposi’s sarcoma-associ- 
ated herpesvirus; VACV, vaccinia virus; reoV, reovirus; EBV, Epstein-Barr virus; RotaV, rotavirus; AV, adenovirus; VEEV, Venezueian equine encephaiitis virus; SARS-CoV, severe acute 
respiratory syndrome corona virus; EMCV, encephaiomyocarditis virus; MARV, Marburg virus; CHiKV, Chikungunya virus; SeV, Sendai virus; HBV, hepatitis B virus; EBOV, Eboia virus. 

Virai proteins: Nef, negative factor; NS5A, nonstructurai protein 5A; Vif, virai infectivity factor; Vpr, virai protein R; Vpu, virai protein unknown; Vpx, virai protein X; Env, enveiope; viRF-2, 
virai iRF2-iike protein; US1 1 , tegument protein unique short 1 1 . 

Other: eiF2a, eukaryotic transiation initiation factor 2a; eiF3, eukaryotic transiation initiation factor 3. 
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